diff --git a/03-visualization.Rmd b/03-visualization.Rmd index 8ad82d359..4d7c9d743 100755 --- a/03-visualization.Rmd +++ b/03-visualization.Rmd @@ -15,7 +15,7 @@ knitr::opts_chunk$set( fig.height = 4, fig.align='center', warning = FALSE - ) +) options(scipen = 99, digits = 3) @@ -29,7 +29,7 @@ set.seed(76) We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn't initially see from just looking at the raw data in spreadsheet form. We will use the `ggplot2` package as it provides an easy way to customize your plots. `ggplot2` is rooted in the data visualization theory known as _The Grammar of Graphics_ [@wilkinson2005]. -At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasise the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don't want to include so many as to overwhelm your audience. +At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasize the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don't want to include so many as to overwhelm your audience. As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the *distribution* of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is *distributed* in terms of its values) as we go across the levels of a different categorical variable. @@ -54,13 +54,13 @@ library(readr) ---- +*** ## The Grammar of Graphics {#grammarofgraphics} -We begin with a discussion of a theoretical framework for data visualization known as "The Grammar of Graphics," which serves as the foundation for the `ggplot2` package. Think of how we construct sentences in english to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can't just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, "The Grammar of Graphics" define a set of rules for contructing *statistical graphics* by combining different types of *layers*. This grammar was created by Leland Wilkinson [@wilkinson2005] and has been implemented in a variety of data visualization software including R. +We begin with a discussion of a theoretical framework for data visualization known as "The Grammar of Graphics," which serves as the foundation for the `ggplot2` package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can't just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, "The Grammar of Graphics" define a set of rules for constructing *statistical graphics* by combining different types of *layers*. This grammar was created by Leland Wilkinson [@wilkinson2005] and has been implemented in a variety of data visualization software including R. ### Components of the Grammar @@ -165,7 +165,7 @@ There are other components of the Grammar of Graphics we can control as well. A - `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation). --> -Other more complex components like `scales` and `coord`inate systems are left for a more advanced text such as [R for Data Science](http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings) [@rds2016]. Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifiying them. +Other more complex components like `scales` and `coord`inate systems are left for a more advanced text such as [R for Data Science](http://r4ds.had.co.nz/data-visualisation.html#aesthetic-mappings) [@rds2016]. Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them. ### ggplot2 package @@ -180,7 +180,7 @@ Let's now put the theory of the Grammar of Graphics into practice. ---- +*** @@ -198,7 +198,7 @@ We will discuss some variations of these plots, but with this basic repertoire o ---- +*** @@ -367,7 +367,7 @@ With medium to large data sets, you may need to play around with the different m --> ---- +*** ## 5NG#2: Linegraphs {#linegraphs} @@ -438,11 +438,11 @@ Much as with the `ggplot()` code that created the scatterplot of departure and a ### Summary -Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use lingraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time. +Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time. ---- +*** @@ -491,7 +491,7 @@ The remaining bins all have a similar interpretation. ### Histograms via geom_histogram {#geomhistogram} -Let's now present the `ggplot()` code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in `aes()`: the single numerical variable `temp`. The y-aesthetic of a histogram gets computed for you automatically. Furthemore, the geometric object layer is now a `geom_histogram()` +Let's now present the `ggplot()` code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in `aes()`: the single numerical variable `temp`. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a `geom_histogram()` ```{r weather-histogram, warning=TRUE, fig.cap="Histogram of hourly temperatures at three NYC airports."} ggplot(data = weather, mapping = aes(x = temp)) + @@ -524,7 +524,7 @@ Observe in both Figure \@ref(fig:weather-histogram-2) and Figure \@ref(fig:weath Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows: -```{r, warning=FALSE, message=FALSE, fig.cap= "Histogram with 60 bins."} +```{r, warning=FALSE, message=FALSE, fig.cap= "Histogram with 40 bins."} ggplot(data = weather, mapping = aes(x = temp)) + geom_histogram(bins = 40, color = "white") ``` @@ -558,13 +558,13 @@ Histograms, unlike scatterplots and linegraphs, present information on only a si ---- +*** ## Facets {#facets} -Before continuing the 5NG, let's briefly introduce a new concept called *faceting*. Faceting is used when we'd like to split a particular visualization of variables by another variable. This will create mutiple copies of the same type of plot with matching x and y axes, but whose content will differ. +Before continuing the 5NG, let's briefly introduce a new concept called *faceting*. Faceting is used when we'd like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ. For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section \@ref(histograms) differed by month. We would "split" this histogram by the 12 possible months in a given year, in other words plot histograms of `temp` for each `month`. We do this by adding `facet_wrap(~ month)` layer. @@ -574,7 +574,7 @@ ggplot(data = weather, mapping = aes(x = temp)) + facet_wrap(~ month) ``` -Note the use of the tilde `~` before `month` in `facet_wrap()`. The tilde is required and you'll receive the error `Error in as.quoted(facets) : object 'month' not found` if you don't include it before `month` here. We can also specify the number of rows and columns in the grid by using the `nrow` and `ncol` arguments inside of `facet_wrap()`. For example, say we would like our facetted plot to have 4 rows instead of 3. Add the `nrow = 4` argument to `facet_wrap(~ month)` +Note the use of the tilde `~` before `month` in `facet_wrap()`. The tilde is required and you'll receive the error `Error in as.quoted(facets) : object 'month' not found` if you don't include it before `month` here. We can also specify the number of rows and columns in the grid by using the `nrow` and `ncol` arguments inside of `facet_wrap()`. For example, say we would like our faceted plot to have 4 rows instead of 3. Add the `nrow = 4` argument to `facet_wrap(~ month)` ```{r facethistogram2, fig.cap="Faceted histogram with 4 instead of 3 rows."} ggplot(data = weather, mapping = aes(x = temp)) + @@ -601,7 +601,7 @@ Observe in both Figure \@ref(fig:facethistogram) and Figure \@ref(fig:facethisto ---- +*** @@ -732,7 +732,7 @@ It is important to keep in mind that the definition of an outlier is somewhat ar **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Which months have the highest variability in temperature? What reasons can you give for this? -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of the numerical variable `temp` split by the numerical variable `month` that we converted to a categorical variable using the `factor()` function. Why would a boxplot of `temp` split by the numerical variable `pressure` similarly converted to a categorical variable using the `factor()` not be informative? **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? @@ -745,7 +745,7 @@ Side-by-side boxplots provide us with a way to compare and contrast the distribu ---- +*** @@ -985,7 +985,7 @@ Barplots are the preferred way of displaying the distribution of a categorical v ---- +*** @@ -1096,7 +1096,7 @@ ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line() ``` -These two code segments were a preview of Chapter \@ref(wrangling) on data wrangling where we'll delve further into the `dplyr` package. Data wrangling is the process of transforming and modifying existing data to with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the `filter()` function to create new data frames (`alaska_flights` and `early_january_weather`) by choosing only a subset of rows of existing data frames (`flights` and `weather`). In this next chapter, we'll formally introduce the `filter()` and other data wrangling functions as well as the *pipe operator* `%>%` which allows you to combine multiple data wrangling actions into a single sequential *chain* of actions. On to Chapter \@ref(wrangling) on data wrangling! +These two code segments were a preview of Chapter \@ref(wrangling) on data wrangling where we'll delve further into the `dplyr` package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the `filter()` function to create new data frames (`alaska_flights` and `early_january_weather`) by choosing only a subset of rows of existing data frames (`flights` and `weather`). In this next chapter, we'll formally introduce the `filter()` and other data wrangling functions as well as the *pipe operator* `%>%` which allows you to combine multiple data wrangling actions into a single sequential *chain* of actions. On to Chapter \@ref(wrangling) on data wrangling! ```{r echo=FALSE, fig.cap="ModernDive flowchart", out.width='110%', fig.align='center'} # knitr::include_graphics("images/flowcharts/flowchart/flowchart.004.png") diff --git a/04-wrangling.Rmd b/04-wrangling.Rmd index 7b4cf5a69..4b49905f5 100755 --- a/04-wrangling.Rmd +++ b/04-wrangling.Rmd @@ -13,7 +13,7 @@ knitr::opts_chunk$set( fig.height = 4, fig.align='center', warning = FALSE - ) +) options(scipen = 99, digits = 3) @@ -25,7 +25,7 @@ options(knitr.kable.NA = '') set.seed(76) ``` -So far in our journey, we've seen how to look at data saved in data frames using the `glimpse()` and `View()` functions in Chapter \@ref(getting-started) on and how to create data visualizations using the `ggplot2` package in Chapter \@ref(viz). In particular we study what we term the "five named graphs" (5NG): +So far in our journey, we've seen how to look at data saved in data frames using the `glimpse()` and `View()` functions in Chapter \@ref(getting-started) on and how to create data visualizations using the `ggplot2` package in Chapter \@ref(viz). In particular we studied what we term the "five named graphs" (5NG): 1. scatterplots via `geom_point()` 1. linegraphs via `geom_line()` @@ -33,9 +33,9 @@ So far in our journey, we've seen how to look at data saved in data frames using 1. histograms via `geom_histogram()` 1. barplots via `geom_bar()` or `geom_col()` -We created these visualization using the "Grammar of Graphics", which maps variables in a data frame to the aesthetic attributes of the above 5 `geom`etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure \@ref(fig:gapminder). +We created these visualizations using the "Grammar of Graphics", which maps variables in a data frame to the aesthetic attributes of one the above 5 `geom`etric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure \@ref(fig:gapminder). -Furthermore in Section \@ref(whats-to-come-3) we discussed that for two of our visualizations, we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay *only* for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the `flights` data frame to a new data frame `alaska_flights` consisting of only `carrier == AS` flights using the `filter()` function. +Recall however in Section \@ref(whats-to-come-3) we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay *only* for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the `flights` data frame to a new data frame `alaska_flights` consisting of only `carrier == "AS"` flights using the `filter()` function. ```{r, eval=FALSE} alaska_flights <- flights %>% @@ -48,13 +48,15 @@ ggplot(data = alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + In this chapter, we'll introduce a series of functions from the `dplyr` package that will allow you to take a data frame and 1. `filter()` its existing rows to only pick out a subset of them. For example, the `alaska_flights` data frame above. -1. `summarize()` one of its columns/variables with a *summary statistic*. For example, the median and interquartile range of temperatures as we saw in Section \@ref(boxplots) on boxplots. -1. `group_by()` its rows. In other words assign different rows to be part of the same *group* and thus report summary statistics for each group separately. For example, perhaps you want not the overall average departure delay `dep_delay` for all three `origin` airports combined, but the average departure delay for each of the three `origin` airports separately. +1. `summarize()` one of its columns/variables with a *summary statistic*. Examples include the median and interquartile range of temperatures as we saw in Section \@ref(boxplots) on boxplots. +1. `group_by()` its rows. In other words assign different rows to be part of the same *group* and report summary statistics for each group separately. For example, say perhaps you don't want a single overall average departure delay `dep_delay` for all three `origin` airports combined, but rather three separate average departure delays, one for each of the three `origin` airports. 1. `mutate()` its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C. 1. `arrange()` its rows. For example, sort the rows of `weather` in ascending or descending order of `temp`. 1. `join()` it with another data frame by matching along a "key" variable. In other words, merge these two data frames together. -Notice how we used computer code type font to describe the actions we want to take on our data frames. This is because the `dplyr` package have intuitively verb-named functions that are easy to remember. We'll start by introducing the pipe operator `%>%`, which allows you to combine multiple data wrangling verb-named functions into a single sequential *chain* of actions. +Notice how we used `computer code` font to describe the actions we want to take on our data frames. This is because the `dplyr` package for data wrangling that we'll introduce in this chapter has intuitively verb-named functions that are easy to remember. + +We'll start by introducing the pipe operator `%>%`, which allows you to combine multiple data wrangling verb-named functions into a single sequential *chain* of actions. ### Needed packages {-} @@ -76,13 +78,13 @@ library(readr) ---- +*** ## The pipe operator: `%>%` {#piping} -Before we dig into data wrangling, let's first introduce a very nifty tool that gets loaded along with the `dplyr` package: the pipe operator `%>%`. Let's say you would like to perform this sequence of operations in R: +Before we start data wrangling, let's first introduce a very nifty tool that gets loaded along with the `dplyr` package: the pipe operator `%>%`. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame `x` using hypothetical functions `f()`, `g()`, and `h()`: 1. Take `x` *then* 1. Use `x` as an input to a function `f()` *then* @@ -95,7 +97,7 @@ One way to achieve this sequence of operations is by using nesting parentheses a h(g(f(x))) ``` -In this case, the above code isn't so hard to read since we are applying only three functions: `f()`, then `g()`, then `h()`. However, you can imagine this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator `%>%` (pronounced "then") comes in handy. `%>%` takes one output of one function and then "pipes" it to be the input of the next function. For example: you can obtain the same output as the above sequence of operations as follows: +The above code isn't so hard to read since we are applying only three functions: `f()`, then `g()`, then `h()`. However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator `%>%` comes in handy. `%>%` takes one output of one function and then "pipes" it to be the input of the next function. Furthermore, a helpful trick is to read `%>%` as "then." For example, you can obtain the same output as the above sequence of operations as follows: ```{r, eval = FALSE} x %>% @@ -111,7 +113,7 @@ You would read this above sequence as: 1. Use this output as the input to the next function `g()` *then* 1. Use this output as the input to the next function `h()` -So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are `x`, `f()`, `g()`, and `h()`? Throughout this chapter on data wrangling: +So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical `x`, `f()`, `g()`, and `h()`? Throughout this chapter on data wrangling: * The starting value `x` will be a data frame. For example: `flights`. * The sequence of functions, here `f()`, `g()`, and `h()`, will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: `filter(carrier == "AS")`. @@ -124,12 +126,11 @@ alaska_flights <- flights %>% filter(carrier == "AS") ``` +Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you'll see some examples of these near in Section \@ref(other-verbs). However, just with these 6 verb-named functions you'll be able to perform a broad array of data wrangling tasks for the rest of this book. -Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you'll see some examples of these near in Section \@ref(other-verbs). However, just with these 6 verb-named functions you'll be able to perform a broad array of data wrangling tasks. - ---- +*** @@ -139,7 +140,7 @@ Keep in mind, there are many more advanced data wrangling functions than just th knitr::include_graphics("images/filter.png") ``` -The `filter()` function here works much like the "Filter" option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The `dest` code (or airport code) for Portland, Oregon is `"PDX"`. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: +The `filter()` function here works much like the "Filter" option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The `dest` code (or airport code) for Portland, Oregon is `"PDX"`. Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here: ```{r, eval=FALSE} portland_flights <- flights %>% @@ -150,38 +151,48 @@ View(portland_flights) Note the following: * The ordering of the commands: - + Take the data frame `flights` *then* + + Take the `flights` data frame `flights` *then* + `filter` the data frame so that only those where the `dest` equals `"PDX"` are included. -* The double equal sign `==` for testing for equality, and not `=`. You are almost guaranteed to make the mistake at least once of only including one equals sign. - -You can combine multiple criteria together using operators that make comparisons: - -- `|` corresponds to "or" -- `&` corresponds to "and" - -We can often skip the use of `&` and just separate our conditions with a comma. You'll see this in the example below. +* We test for equality using the double equal sign `==` and not a single equal sign `=`. In other words `filter(dest = "PDX")` will yield an error. This is a convention across many programming languages. If you are new to coding, you'll probably forget to use the double equal sign `==` a few times before you get the hang of it. -In addition, you can use other mathematical checks (similar to `==`): +You can use other mathematical operations beyond just `==` to form criteria: - `>` corresponds to "greater than" - `<` corresponds to "less than" - `>=` corresponds to "greater than or equal to" - `<=` corresponds to "less than or equal to" -- `!=` corresponds to "not equal to" +- `!=` corresponds to "not equal to". The `!` is used in many programming languages to indicate "not". -To see many of these in action, let's select all flights that left JFK airport heading to Burlington, Vermont (`"BTV"`) or Seattle, Washington (`"SEA"`) in the months of October, November, or December. Run the following +Furthermore, you can combine multiple criteria together using operators that make comparisons: + +- `|` corresponds to "or" +- `&` corresponds to "and" + +To see many of these in action, let's filter `flights` for all rows that: + +* Departed from JFK airport and +* Were heading to Burlington, Vermont (`"BTV"`) or Seattle, Washington (`"SEA"`) and +* Departed in the months of October, November, or December. + +Run the following: ```{r, eval=FALSE} btv_sea_flights_fall <- flights %>% - filter(origin == "JFK", - dest == "BTV" | dest == "SEA", - month >= 10) + filter(origin == "JFK" & (dest == "BTV" | dest == "SEA") & month >= 10) View(btv_sea_flights_fall) ``` -Note: even though colloquially speaking one might say "all flights leaving Burlington, Vermont *and* Seattle, Washington," in terms of computer logical operations, we really mean "all flights leaving Burlington, Vermont *or* Seattle, Washington." For a given row in the data, `dest` can be "BTV", "SEA", or something else, but not "BTV" and "SEA" at the same time. +Note that even though colloquially speaking one might say "all flights leaving Burlington, Vermont *and* Seattle, Washington," in terms of computer operations, we really mean "all flights leaving Burlington, Vermont *or* leaving Seattle, Washington." For a given row in the data, `dest` can be "BTV", "SEA", or something else, but not "BTV" and "SEA" at the same time. Furthermore, note the careful use of parentheses around the `dest == "BTV" | dest == "SEA"`. -Another example uses the `!` to pick rows that *don't* match a condition. The `!` can be read as "not." Here we are selecting rows corresponding to flights that didn't go to Burlington, VT or Seattle, WA. +We can often skip the use of `&` and just separate our conditions with a comma. In other words the code above will return the identical output `btv_sea_flights_fall` as this code below: + +```{r, eval=FALSE} +btv_sea_flights_fall <- flights %>% + filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10) +View(btv_sea_flights_fall) +``` + +Let's present another example that uses the `!` "not" operator to pick rows that *don't* match a criteria. As mentioned earlier, the `!` can be read as "not." Here we are filtering rows corresponding to flights that didn't go to Burlington, VT or Seattle, WA. ```{r, eval=FALSE} not_BTV_SEA <- flights %>% @@ -189,6 +200,15 @@ not_BTV_SEA <- flights %>% View(not_BTV_SEA) ``` +Again, note the careful use of parentheses around the `(dest == "BTV" | dest == "SEA")`. If we didn't use parentheses as follows: + +```{r, eval=FALSE} +flights %>% + filter(!dest == "BTV" | dest == "SEA") +``` + +We would be returning all flights not headed to `"BTV"` *or* those headed to `"SEA"`, which is an entirely different resulting data frame. + Now say we have a large list of airports we want to filter for, say `BTV`, `SEA`, `PDX`, `SFO`, and `BDL`. We could continue to use the `|` or operator as so: ```{r, eval=FALSE} @@ -197,7 +217,7 @@ many_airports <- flights %>% View(many_airports) ``` -but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the `%in%` operator: +but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the `%in%` operator: ```{r, eval=FALSE} many_airports <- flights %>% @@ -205,28 +225,28 @@ many_airports <- flights %>% View(many_airports) ``` -What this code is doing is its filtering for all flights where `dest` is in the list of airports `c("BTV", "SEA", "PDX", "SFO", "BDL")`. Both outputs of `many_airports` are the same, but as you can see the latter takes much less time to code. +What this code is doing is filtering `flights` for all flights where `dest` is in the list of airports `c("BTV", "SEA", "PDX", "SFO", "BDL")`. Recall from Chapter \@ref(getting-started) that the `c()` function "combines" or "concatenates" values in a vector of values. Both outputs of `many_airports` are the same, but as you can see the latter takes much less time to code. -As a final note we point out that `filter()` should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about. +As a final note we point out that `filter()` should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about. ```{block lc-filter, type='learncheck', purl=FALSE} **_Learning check_** ``` -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above. +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way of using the "not" operator `!` to filter only the rows that are not going to Burlington VT nor Seattle WA in the `flights` data frame? Test this out using the code above. ```{block, type='learncheck', purl=FALSE} ``` ---- +*** ## `summarize` variables {#summarize} -The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values. +The next common task when working with data is to return *summary statistics*: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values. ```{r sum1, echo=FALSE, fig.cap="Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet", purl=FALSE} knitr::include_graphics("images/summarize1.png") @@ -237,25 +257,15 @@ knitr::include_graphics("images/summary.png") options(knitr.kable.NA = 'NA') ``` -We can calculate the standard deviation and mean of the temperature variable `temp` in the `weather` data frame of `nycflights13` in one step using the `summarize` (or equivalently using the UK spelling `summarise`) function in `dplyr` (See Appendix \@ref(appendixA)): +Let's calculate the mean and the standard deviation of the temperature variable `temp` in the `weather` data frame included in the `nycflights13` package (See Appendix \@ref(appendixA)). We'll do this in one step using the `summarize()` function from the `dplyr` package and save the results in a new data frame `summary_temp` with columns/variables `mean` and the `std_dev`. Note you can also use the UK spelling of "summarise" using the `summarise()` function. -```{r, eval=FALSE} +As shown in Figures \@ref(fig:sum1) and \@ref(fig:sum2), the `weather` data frame's many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation: + +```{r, eval=TRUE} summary_temp <- weather %>% - summarize(mean = mean(temp), - std_dev = sd(temp)) + summarize(mean = mean(temp), std_dev = sd(temp)) summary_temp ``` - - - -``` -# A tibble: 1 x 2 - mean std_dev - -1 NA NA -``` ```{r, echo=FALSE, eval=FALSE} options(knitr.kable.NA = '') summary_temp <- weather %>% @@ -266,21 +276,19 @@ kable(summary_temp) %>% latex_options = c("HOLD_position")) ``` -We've created a small data frame here called `summary_temp` that includes both the `mean` and the `std_dev` of the `temp` variable in `weather`. Notice as shown in Figures \@ref(fig:sum1) and \@ref(fig:sum2), the data frame `weather` went from many rows to a single row of just the summary values in the data frame `summary_temp`. - -But why are the values returned `NA`? This stands for "not available or not applicable" and is how R encodes *missing values*; if in a data frame for a particular row and column no value exists, `NA` is stored instead. Furthermore, by default any time you try to summarize a number of values (using `mean()` and `sd()` for example) that has one or more missing values, then `NA` is returned. +Why are the values returned `NA`? As we saw in Section \@ref(geompoint) when creating the scatterplot of departure and arrival delays for `alaska_flights`, `NA` is how R encodes *missing values* where `NA` indicates "not available" or "not applicable." If a value for a particular row and a particular column does not exist, `NA` is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You'll often encounter issues with missing values when working with real data. -Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You'll often encounter issues with missing values. +Going back to our `summary_temp` output above, by default any time you try to calculate a summary statistic of a variable that has one or more `NA` missing values in R, then `NA` is returned. To work around this fact, you can set the `na.rm` argument to `TRUE`, where `rm` is short for "remove"; this will ignore any `NA` missing values and only return the summary value for all non-missing values. -You can summarize all non-missing values by setting the `na.rm` argument to TRUE (`rm` is short for "remove"). This will remove any `NA` missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the `na.rm=TRUE` are set as arguments to the `mean()` and `sd()` functions, and not to the `summarize()` function. +The code below computes the mean and standard deviation of all non-missing values of `temp`. Notice how the `na.rm=TRUE` are used as arguments to the `mean()` and `sd()` functions individually, and not to the `summarize()` function. -```{r, eval=FALSE} +```{r, eval = TRUE} summary_temp <- weather %>% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) summary_temp ``` -```{r, echo=FALSE} +```{r, echo=FALSE, eval=FALSE} summary_temp <- weather %>% summarize(mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE)) @@ -289,17 +297,9 @@ kable(summary_temp) %>% latex_options = c("HOLD_position")) ``` -It is not good practice to include a `na.rm = TRUE` in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you've identified where missing values occur and have thought about the potential causes of this missing should you consider using `na.rm = TRUE`. In the upcoming Learning Checks we'll consider the possible ramifications of blindly sweeping rows with missing values under the rug. - - +However, one needs to be cautious whenever ignoring missing values as we've done above. In the upcoming Learning Checks we'll consider the possible ramifications of blindly sweeping rows with missing values "under the rug." This is in fact why the `na.rm` argument to any summary statistic function in R has is set to `FALSE` by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis. -What other summary functions can we use inside the `summarize()` verb? Any function in R that takes a vector of values and returns just one. Here are just a few: +What are other summary statistic functions can we use inside the `summarize()` verb? As seen in Figure \@ref(fig:sum2), you can use any function in R that takes many values and returns just one. Here are just a few: * `mean()`: the mean AKA the average * `sd()`: the standard deviation, which is a measure of spread @@ -329,7 +329,7 @@ summary_temp <- weather %>% ---- +*** @@ -339,14 +339,7 @@ summary_temp <- weather %>% knitr::include_graphics("images/group_summary.png") ``` -It's often more useful to summarize a variable based on the groupings of another variable. Let's say, we are interested in the mean and standard deviation of temperatures but *grouped by month*. To be more specific: we want the mean and standard deviation of temperatures - -1. split by month. -1. sliced by month. -1. aggregated by month. -1. collapsed over month. - -Run the following code: +Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by "grouping" temperature observations by the values of another variable, in this case by the 12 values of the variable `month`. Run the following code: ```{r, eval=FALSE} summary_monthly_temp <- weather %>% @@ -365,21 +358,50 @@ kable(summary_monthly_temp) %>% latex_options = c("HOLD_position")) ``` -This code is identical to the previous code that created `summary_temp`, with an extra `group_by(month)` added. Grouping the `weather` dataset by `month` and then passing this new data frame into `summarize` yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in `summary_monthly_temp` represents a summary of different rows in `weather`, the observational units have changed. +This code is identical to the previous code that created `summary_temp`, but with an extra `group_by(month)` added before the `summarize()`. Grouping the `weather` dataset by `month` and then applying the `summarize()` functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year. + +It is important to note that the `group_by()` function doesn't change data frames by itself. Rather it changes the *meta-data*, or data about the data, specifically the group structure. It is only after we apply the `summarize()` function that the data frame changes. For example, let's consider the `diamonds` data frame included in the `ggplot2` package. Run this code, specifically in the console: -It is important to note that `group_by` doesn't change the data frame. It sets *meta-data* (data about the data), specifically the group structure of the data. It is only after we apply the `summarize` function that the data frame changes. +```{r, eval=TRUE} +diamonds +``` -If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the `ungroup()` function. For example, say the group structure meta-data is set to be by month via `group_by(month)`, all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and `ungroup()` to remove this. +Observe that the first line of the output reads `# A tibble: 53,940 x 10`. This is an example of meta-data, in this case the number of observations/rows and variables/columns in `diamonds`. The actual data itself are the subsequent table of values. -We now revisit the `n()` counting summary function we introduced in the previous section. For example, suppose we'd like to get a sense for how many flights departed each of the three airports in New York City: +Now let's pipe the `diamonds` data frame into `group_by(cut)`. Run this code, specifically in the console: -```{r, eval=FALSE} +```{r, eval=TRUE} +diamonds %>% + group_by(cut) +``` + +Observe that now there is additional meta-data: `# Groups: cut [5]` indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable `cut`: `"Fair"`, `"Good"`, `"Very Good"`, `"Premium"`, `"Ideal"`. On the other hand observe that the data has not changed: it is still a table of 53,940 $\times$ 10 values. + +Only by combining a `group_by()` with another data wrangling operation, in this case `summarize()` will the actual data be transformed. + +```{r, eval=TRUE} +diamonds %>% + group_by(cut) %>% + summarize(avg_price = mean(price)) +``` + +If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the `ungroup()` function. Observe how the `# Groups: cut [5]` meta-data is no longer present. Run this code, specifically in the console: + +```{r, eval=TRUE} +diamonds %>% + group_by(cut) %>% + ungroup() +``` + +Let's now revisit the `n()` counting summary function we introduced in the previous section. For example, suppose we'd like to count how many flights departed each of the three airports in New York City: + +```{r, eval=TRUE} by_origin <- flights %>% group_by(origin) %>% summarize(count = n()) by_origin ``` -```{r, echo=FALSE} +```{r, echo=FALSE, eval=FALSE} by_origin <- flights %>% group_by(origin) %>% summarize(count = n()) @@ -388,12 +410,12 @@ kable(by_origin) %>% latex_options = c("HOLD_position")) ``` -We see that Newark (`"EWR"`) had the most flights departing in 2013 followed by `"JFK"` and lastly by LaGuardia (`"LGA"`). Note there is a subtle but important difference between `sum()` and `n()`. While `sum()` simply adds up a large set of numbers, the latter counts the number of times each of many different values occur. +We see that Newark (`"EWR"`) had the most flights departing in 2013 followed by `"JFK"` and lastly by LaGuardia (`"LGA"`). Note there is a subtle but important difference between `sum()` and `n()`; While `sum()` returns the sum of a numerical variable, `n()` returns counts of the the number of rows/observations. ### Grouping by more than one variable -You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports *for each month*, we can also group by a second variable `month`: `group_by(origin, month)`. +You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports *for each month*, we can also group by a second variable `month`: `group_by(origin, month)`. We see there are 36 rows to `by_origin_monthly` because there are 12 months for 3 airports (`EWR`, `JFK`, and `LGA`). ```{r} by_origin_monthly <- flights %>% @@ -402,7 +424,7 @@ by_origin_monthly <- flights %>% by_origin_monthly ``` -We see there are 36 rows to `by_origin_monthly` because there are 12 months times 3 airports (`EWR`, `JFK`, and `LGA`). Why do we `group_by(origin, month)` and not `group_by(origin)` and then `group_by(month)`? Let's investigate: +Why do we `group_by(origin, month)` and not `group_by(origin)` and then `group_by(month)`? Let's investigate: ```{r} by_origin_monthly_incorrect <- flights %>% @@ -412,20 +434,7 @@ by_origin_monthly_incorrect <- flights %>% by_origin_monthly_incorrect ``` -What happened here is that the second `group_by(month)` overrode the first `group_by(origin)`, so that in the end we are only grouping by `month`. The lesson here, is if you want to `group_by()` two or more variables, you should include all these variables in a single `group_by()` function call. - - - - +What happened here is that the second `group_by(month)` overrode the group structure meta-data of the first `group_by(origin)`, so that in the end we are only grouping by `month`. The lesson here is if you want to `group_by()` two or more variables, you should include all these variables in a single `group_by()` function call. ```{block lc-groupby, type='learncheck', purl=FALSE} **_Learning check_** @@ -446,7 +455,7 @@ by_monthly_origin ---- +*** @@ -456,7 +465,39 @@ by_monthly_origin knitr::include_graphics("images/mutate.png") ``` -When looking at the `flights` dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as "gain" and we will create this variable using the `mutate` function. Note that we have also overwritten the `flights` data frame with what it was before as well as an additional variable `gain` here, or put differently, the `mutate()` command outputs a new data frame which then gets saved over the original `flights` data frame. +Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius °C and not degrees Farenheit °F. The formula to convert temperatures from °F to °C is: + +$$ +\text{temp in C} = \frac{\text{temp in F} - 32}{1.8} +$$ + +We can apply this formula to the `temp` variable using the `mutate()` function, which takes existing variables and mutates them to create new ones. + +```{r, eval=FALSE} +weather <- weather %>% + mutate(temp_in_C = (temp-32)/1.8) +View(weather) +```` +```{r, eval=TRUE, echo=FALSE} +weather <- weather %>% + mutate(temp_in_C = (temp-32)/1.8) +```` + +Note that we have overwritten the original `weather` data frame with a new version that now includes the additional variable `temp_in_C`. In other words, the `mutate()` command outputs a new data frame which then gets saved over the original `weather` data frame. Furthermore, note how in `mutate()` we used `temp_in_C = (temp-32)/1.8` to create a new variable `temp_in_C`. + +Why did we overwrite the data frame `weather` instead of assigning the result to a new data frame like `weather_new`, but on the other hand why did we *not* overwrite `temp`, but instead created a new variable called `temp_in_C`? As a rough rule of thumb, as long as you are not losing original information that you might need later, it's acceptable practice to overwrite existing data frames. On the other hand, had we used `mutate(temp = (temp-32)/1.8)` instead of `mutate(temp_in_C = (temp-32)/1.8)`, we would have overwritten the original variable `temp` and lost its values. + +Let's compute average monthly temperatures in both °F and °C using the similar `group_by()` and `summarize()` code as in the previous section. + +```{r} +summary_monthly_temp <- weather %>% + group_by(month) %>% + summarize(mean_temp_in_F = mean(temp, na.rm = TRUE), + mean_temp_in_C = mean(temp_in_C, na.rm = TRUE)) +summary_monthly_temp +```` + +Let's consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time. This is commonly referred to as "gain" and we will create this variable using the `mutate()` function. ```{r} flights <- flights %>% @@ -473,8 +514,6 @@ flights %>% The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its "gained time in the air" is actually a loss of 9 minutes, hence its `gain` is `-9`. Contrast this to the flight in the fourth row which departed a minute early (`dep_delay` of `-1`) but arrived 18 minutes early (`arr_delay` of `-18`), so its "gained time in the air" is 17 minutes, hence its `gain` is `+17`. -Why did we overwrite `flights` instead of assigning the resulting data frame to a new object, like `flights_with_gain`? As a rough rule of thumb, as long as you are not losing information that you might need later, it's acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object. - Let's look at summary measures of this `gain` variable and even plot it in the form of a histogram: ```{r, eval=FALSE} @@ -541,15 +580,15 @@ flights <- flights %>% ---- +*** ## `arrange` and sort rows {#arrange} -One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The `dplyr` package has a function called `arrange` that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the `group_by` and `summarize` functions as we will see. +One of the most common tasks people working with data would like to perform is sort the data frame's rows in alphanumeric order of the values in a variable/column. For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the "middle" value. The `dplyr` package has a function called `arrange()` that we will use to sort/reorder a data frame's rows according to the values of the specified variable. This is often used after we have used the `group_by()` and `summarize()` functions as we will see. -Let's suppose we were interested in determining the most frequent destination airports from New York City in 2013: +Let's suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013: ```{r, eval} freq_dest <- flights %>% @@ -558,45 +597,47 @@ freq_dest <- flights %>% freq_dest ``` -You'll see that by default the values of `dest` are displayed in alphabetical order here. We are interested in finding those airports that appear most: +Observe that by default the rows of the resulting `freq_dest` data frame are sorted in alphabetical order of `dest` destination. Say instead we would like to see the same data, but sorted from the most to the least number of flights `num_flights` instead: ```{r} freq_dest %>% arrange(num_flights) ``` -This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the `desc` (`desc`ending) function: +This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first. To switch the ordering to be descending instead of ascending we use the `desc()` function, which is short for "descending": ```{r} freq_dest %>% arrange(desc(num_flights)) ``` +In other words, `arrange()` sorts in ascending order by default unless you override this default behavior by using `desc()`. ---- + +*** ## `join` data frames {#joins} -Another common task is joining AKA merging two different datasets. For example, in the `flights` data, the variable `carrier` lists the carrier code for the different flights. While `"UA"` and `"AA"` might be somewhat easy to guess for some (United and American Airlines), what are "VX", "HA", and "B6"? This information is provided in a separate data frame `airlines`. +Another common data transformation task is "joining" or "merging" two different datasets. For example in the `flights` data frame the variable `carrier` lists the carrier code for the different flights. While the corresponding airline names for `"UA"` and `"AA"` might be somewhat easy to guess (United and American Airlines), what airlines have codes? `"VX"`, `"HA"`, and `"B6"`? This information is provided in a separate data frame `airlines`. ```{r eval=FALSE} View(airlines) ``` -We see that in `airports`, `carrier` is the carrier code while `name` is the full name of the airline. Using this table, we can see that "VX", "HA", and "B6" correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier's name for each flight in the `airlines` dataset? No! Instead of having to do this manually, we can have R automatically do the "looking up" for us. +We see that in `airports`, `carrier` is the carrier code while `name` is the full name of the airline company. Using this table, we can see that `"VX"`, `"HA"`, and `"B6"` correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn't it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by "joining" i.e. "merging" the `flights` and `airlines` data frames. -Note that the values in the variable `carrier` in `flights` match the values in the variable `carrier` in `airlines`. In this case, we can use the variable `carrier` as a *key variable* to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection \@ref(identification-vs-measurement) on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett [@rds2016] created the following diagram to help us understand how the different datasets are linked by various key variables: +Note that the values in the variable `carrier` in the `flights` data frame match the values in the variable `carrier` in the `airlines` data frame. In this case, we can use the variable `carrier` as a *key variable* to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection \@ref(identification-vs-measurement-variables). This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett [@rds2016] created the following diagram to help us understand how the different datasets are linked by various key variables: ```{r reldiagram, echo=FALSE, fig.cap="Data relationships in nycflights13 from R for Data Science", purl=FALSE} knitr::include_graphics("images/relational-nycflights.png") ``` -### Joining by "key" variables +### Matching "key" variable names -In both `flights` and `airlines`, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: `carriers`. We make use of the `inner_join()` function to join by the variable `carrier`. +In both the `flights` and `airlines` data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: `carriers`. We make use of the `inner_join()` function to join the two data frames, where the rows will be matched by the variable `carrier`. ```{r eval=FALSE} flights_joined <- flights %>% @@ -605,19 +646,19 @@ View(flights) View(flights_joined) ``` -We observed that the `flights` and `flights_joined` are identical except that `flights_joined` has an additional variable `name` whose values were drawn from `airlines`. +Observe that the `flights` and `flights_joined` data frames are identical except that `flights_joined` has an additional variable `name` whose values correspond to the airline company names drawn from the `airlines` data frame. -A visual representation of the `inner_join` is given below [@rds2016]: +A visual representation of the `inner_join()` is given below [@rds2016]. There are other types of joins available (such as `left_join()`, `right_join()`, `outer_join()`, and `anti_join()`), but the `inner_join()` will solve nearly all of the problems you'll encounter in this book. ```{r ijdiagram, echo=FALSE, fig.cap="Diagram of inner join from R for Data Science", purl=FALSE} knitr::include_graphics("images/join-inner.png") ``` -There are more complex joins available, but the `inner_join` will solve nearly all of the problems you'll face in our experience. -### Joining by "key" variables with different names -Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself: +### Different "key" variable names {#diff-key} + +Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself: - "What cities are these airports in?" - "Is `"ORD"` Orlando?" @@ -629,16 +670,17 @@ The `airports` data frame contains airport codes: View(airports) ``` -However, looking at both the `airports` and `flights` and the visual representation of the relations between the data frames in Figure \@ref(fig:ijdiagram), we see that in: +However, looking at both the `airports` and `flights` frames and the visual representation of the relations between these data frames in Figure \@ref(fig:ijdiagram) above, we see that in: -* `airports` the airport code is in the variable `faa` -* `flights` the airport code is in the variables `origin` and `dest` (destination) +* the `airports` data frame the airport code is in the variable `faa` +* the `flights` data frame the airport codes are in the variables `origin` and `dest` -So to join these two datasets so that we can identify the destination cities, our `inner_join` operation involves a `by` argument that accounts for the different names: +So to join these two data frames so that we can identify the destination cities for example, our `inner_join()` operation will use the `by = c("dest" = "faa")` argument, which allows us to join two data frames where the key variable has a different name: ```{r, eval=FALSE} -flights %>% +flights_with_airport_names <- flights %>% inner_join(airports, by = c("dest" = "faa")) +View(flights_with_airport_names) ``` Let's construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport: @@ -653,19 +695,18 @@ named_dests <- flights %>% named_dests ``` -In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport and `"FLL"` is the main airport in Fort Lauderdale, Florida, which we can now see in our `named_dests` data frame. +In case you didn't know, `"ORD"` is the airport code of Chicago O'Hare airport and `"FLL"` is the main airport in Fort Lauderdale, Florida, which we can now see in the `airport_name` variable in the resulting `named_dests` data frame. -### Joining by multiple "key" variables +### Multiple "key" variables Say instead we are in a situation where we need to join by multiple variables. For example, in Figure \@ref(fig:reldiagram) above we see that in order to join the `flights` and `weather` data frames, we need more than one key variable: `year`, `month`, `day`, `hour`, and `origin`. This is because the combination of these 5 variables act to uniquely identify each observational unit in the `weather` data frame: hourly weather recordings at each of the 3 NYC airports. -We achieve this by specifying a vector of key variables to join by using the `c()` concatenate function. Note the individual variables need to be wrapped in quotation marks. +We achieve this by specifying a vector of key variables to join by using the `c()` function for "combine" or "concatenate" that we saw earlier: -```{r} +```{r, eval=FALSE} flights_weather_joined <- flights %>% - inner_join(weather, - by = c("year", "month", "day", "hour", "origin")) -flights_weather_joined + inner_join(weather, by = c("year", "month", "day", "hour", "origin")) +View(flights_weather_joined) ``` @@ -681,14 +722,43 @@ flights_weather_joined ``` +### Normal forms + +The data frames included in the `nycflights13` package are in a form that minimizes redundancy of data. For example, the `flights` data frame only saves the `carrier` code of the airline company; it does not include the actual name of the airline. For example the first row of `flights` has `carrier` equal to `UA`, but does it does not include the airline name "United Air Lines Inc." The names of the airline companies are included in the `name` variable of the `airlines` data frame. In order to have the airline company name included in `flights`, we could join these two data frames as follows: + +```{r eval=FALSE} +joined_flights <- flights %>% + inner_join(airlines, by = "carrier") +View(joined_flights) +``` + +We are capable of performing this join because each of the data frames have _keys_ in common to relate one to another: the `carrier` variable in both the `flights` and `airlines` data frames. The *key* variable(s) that we join are often *identification variables* we mentioned previously. + +This is an important property of what's known as **normal forms** of data. The process of decomposing data frames into less redundant tables without losing information is called **normalization**. More information is available on [Wikipedia](https://en.wikipedia.org/wiki/Database_normalization). + + +```{block, type='learncheck'} +**_Learning check_** +``` + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms? What are some disadvantages? + +```{block, type='learncheck', purl=FALSE} +``` + + ---- +*** ## Other verbs {#other-verbs} -On top of the following examples of other verbs, if you'd like to see more examples on using `dplyr`, the data wrangling verbs we introduction in Section \@ref(verbs), and the pipe function `%>%` with the `nycflights13` dataset, check out [Chapter 5](http://r4ds.had.co.nz/transform.html) of Hadley and Garrett's book [@rds2016]. +Here are some other useful data wrangling verbs that might come in handy: + +* `select()` only a subset of variables/columns +* `rename()` variables/columns to have new names +* Return only the `top_n()` values of a variable ### `select` variables {#select} @@ -696,30 +766,30 @@ On top of the following examples of other verbs, if you'd like to see more examp knitr::include_graphics("images/select.png") ``` -We've seen that the `flights` data frame in the `nycflights13` package contains many different variables. The `names` function gives a listing of all the columns in a data frame; in our case you would run `names(flights)`. You can also identify these variables by running the `glimpse` function in the `dplyr` package: +We've seen that the `flights` data frame in the `nycflights13` package contains 19 different variables. You can identify the names of these 19 variables by running the `glimpse()` function from the `dplyr` package: ```{r, eval=FALSE} glimpse(flights) ``` -However, say you only want to consider two of these variables, say `carrier` and `flight`. You can `select` these: +However, say you only need two of these variables, say `carrier` and `flight`. You can `select()` these two variables: ```{r, eval=FALSE} flights %>% select(carrier, flight) ``` -This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like `carrier` and `flight` above. So for example, this might make viewing the dataset using the `View()` spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn't care how many additional variables are in the dataset in question, so long as `carrier` and `flight` are included. +This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with `carrier` and `flight` above. This might make viewing the dataset using the `View()` spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn't care how many additional variables are in the data frame in question, so long as `carrier` and `flight` are included. -Another example involves the variable `year`. If you remember the original description of the `flights` data frame (or by running `?flights`), you'll remember that this data correspond to flights in 2013 departing New York City. The `year` variable isn't really a variable here in that it doesn't vary... `flights` actually comes from a larger dataset that covers many years. We may want to remove the `year` variable from our dataset since it won't be helpful for analysis in this case. We can deselect `year` by using the `-` sign: +Let's say instead you want to drop i.e deselect certain variables. For example, take the variable `year` in the `flights` data frame. This variable isn't quite a "variable" in the sense that all the values are `2013` i.e. it doesn't change. Say you want to remove the `year` variable from the data frame; we can deselect `year` by using the `-` sign: ```{r, eval=FALSE} flights_no_year <- flights %>% select(-year) -names(flights_no_year) +glimpse(flights_no_year) ``` -Or we could specify a ranges of columns: +Another way of selecting columns/variables is by specifying a range of columns: ```{r, eval=FALSE} flight_arr_times <- flights %>% @@ -727,15 +797,15 @@ flight_arr_times <- flights %>% flight_arr_times ``` -The `select` function can also be used to reorder columns in combination with the `everything` helper function. Let's suppose we'd like the `hour`, `minute`, and `time_hour` variables, which appear at the end of the `flights` dataset, to actually appear immediately after the `day` variable: +The `select()` function can also be used to reorder columns in combination with the `everything()` helper function. Let's suppose we'd like the `hour`, `minute`, and `time_hour` variables, which appear at the end of the `flights` dataset, to appear immediately after the `year`, `month`, and `day` variables while keeping the rest of the variables. In the code below `everything()` picks up all remaining variables. ```{r, eval=FALSE} flights_reorder <- flights %>% - select(month:day, hour:time_hour, everything()) -names(flights_reorder) + select(year, month, day, hour, minute, time_hour, everything()) +glimpse(flights_reorder) ``` -in this case `everything()` picks up all remaining variables. Lastly, the helper functions `starts_with`, `ends_with`, and `contains` can be used to choose column names that match those conditions: +Lastly, the helper functions `starts_with()`, `ends_with()`, and `contains()` can be used to select variables/column that match those conditions. For example: ```{r, eval=FALSE} flights_begin_a <- flights %>% @@ -757,35 +827,28 @@ flights_time ### `rename` variables {#rename} -Another useful function is `rename`, which as you may suspect renames one column to another name. Suppose we wanted `dep_time` and `arr_time` to be `departure_time` and `arrival_time` instead in the `flights_time` data frame: +Another useful function is `rename()`, which as you may have guessed renames one column to another name. Suppose we want `dep_time` and `arr_time` to be `departure_time` and `arrival_time` instead in the `flights_time` data frame: ```{r, eval=FALSE} flights_time_new <- flights %>% select(contains("time")) %>% rename(departure_time = dep_time, arrival_time = arr_time) -names(flights_time) +glimpse(flights_time) ``` -Note that in this case we used a single `=` sign with the `rename()`. Ex: `departure_time = dep_time`. This is because we are not testing for equality like we would using `==`, but instead we want to assign a new variable `departure_time` to have the same values as `dep_time` and then delete the variable `dep_time`. - - -It's easy to forget if the new name comes before or after the equals sign. I usually remember this as "New Before, Old After" or NBOA. You'll receive an error if you try to do it the other way: - -``` -Error: Unknown variables: departure_time, arrival_time. -``` +Note that in this case we used a single `=` sign within the `rename()`, for example `departure_time = dep_time`. This is because we are not testing for equality like we would using `==`, but instead we want to assign a new variable `departure_time` to have the same values as `dep_time` and then delete the variable `dep_time`. It's easy to forget if the new name comes before or after the equals sign. I usually remember this as "New Before, Old After" or NBOA. ### `top_n` values of a variable -We can also use the `top_n` function which automatically tells us the most frequent `num_flights`. We specify the top 10 airports here: +We can also return the top `n` values of a variable using the `top_n()` function. For example, we can return a data frame of the top 10 destination airports using the example from Section \@ref(diff-key). Observe that we set the number of values to return to `n = 10` and `wt = num_flights` to indicate that we want the rows of corresponding to the top 10 values of `num_flights`. See the help file for `top_n()` by running `?top_n` for more information. ```{r, eval=FALSE} named_dests %>% top_n(n = 10, wt = num_flights) ``` -We'll still need to arrange this by `num_flights` though: +Let's further `arrange()` these results in descending order of `num_flights`: ```{r, eval=FALSE} named_dests %>% @@ -793,18 +856,6 @@ named_dests %>% arrange(desc(num_flights)) ``` -**Note:** Remember that I didn't pull the `n` and `wt` arguments out of thin air. They can be found by using the `?` function on `top_n`. - -We can go one stop further and tie together the `group_by` and `summarize` functions we used to find the most frequent flights: - -```{r, eval=FALSE} -ten_freq_dests <- flights %>% - group_by(dest) %>% - summarize(num_flights = n()) %>% - arrange(desc(num_flights)) %>% - top_n(n = 10) -View(ten_freq_dests) -``` ```{block lc-other-verbs, type='learncheck', purl=FALSE} **_Learning check_** @@ -823,7 +874,7 @@ View(ten_freq_dests) ---- +*** @@ -831,7 +882,7 @@ View(ten_freq_dests) ### Summary table -Let's recap a selection of verbs in Table \@ref(tab:wrangle-summary-table) summarizing their differences. Using these verbs and the pipe `%>%` operator from Section \@ref(piping), you'll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. +Let's recap our data wrangling verbs in Table \@ref(tab:wrangle-summary-table). Using these verbs and the pipe `%>%` operator from Section \@ref(piping), you'll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book. ```{r wrangle-summary-table, echo=FALSE, message=FALSE} # The following Google Doc is published to CSV and loaded below using read_csv() below: @@ -839,15 +890,15 @@ Let's recap a selection of verbs in Table \@ref(tab:wrangle-summary-table) summa "https://docs.google.com/spreadsheets/d/e/2PACX-1vRgwl1lugQA6zxzfB6_0hM5vBjXkU7cbUVYYXLcWeaRJ9HmvNXyCjzJCgiGW8HCe1kvjLCGYHf-BvYL/pub?gid=0&single=true&output=csv" %>% read_csv(na = "") %>% - rename_(" " = "X1") %>% + select(-X1) %>% kable( caption = "Summary of data wrangling verbs", booktabs = TRUE ) %>% kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), latex_options = c("HOLD_position")) %>% - column_spec(2, width = "0.9in") %>% - column_spec(3, width = "3.3in") + column_spec(1, width = "0.9in") %>% + column_spec(2, width = "3.3in") ``` ```{block lc-asm, type='learncheck', purl=FALSE} @@ -881,6 +932,8 @@ You can access this cheatsheet by going to the RStudio Menu Bar -> Help -> Cheat include_graphics("images/dplyr_cheatsheet-1.png") ``` +On top of data wrangling verbs and examples we presented in this section, if you'd like to see more examples of using the `dplyr` package for data wrangling check out [Chapter 5](http://r4ds.had.co.nz/transform.html) of Garrett Grolemund and Hadley Wickham's and Garrett's book [@rds2016]. + -### Importing via RStudio's interface +### Using RStudio's interface Let's read in the exact same data saved in Excel format, but this time via RStudio's graphical interface instead of via the R console. First download the Excel file `dem_score.xlsx` by clicking here, then 1. Go to the Files panel of RStudio. -2. Navigate to the directory where your downloaded `dem_score.xlsx` is saved. -3. Click on `dem_score.xlsx` +2. Navigate to the directory i.e. folder on your computer where the downloaded `dem_score.xlsx` Excel file is saved. +3. Click on `dem_score.xlsx`. 4. Click "Import Dataset..." At this point you should see an image like this: ![](images/read_excel.png) -After clicking on the "Import" button on the bottom right RStudio save this spreadsheet's data in a data frame called `dem_score` and display its contents in the spreadsheet viewer. Furthermore on the bottom right you'll see the code that read in your data in the console; you can copy and paste this code to reload your data again later automatically instead of repeating the above manual process. +After clicking on the "Import" button on the bottom right RStudio, RStudio will save this spreadsheet's data in a data frame called `dem_score` and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a "Code Preview": you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process. ---- +*** -## Tidy data +## Tidy data {#tidy-data-ex} -Let's now switch gears and learn about the concept of "tidy" data format. Let's start with a motivating example. Let's consider the `drinks` data frame included in the `fivethirtyeight` data. Run the +Let's now switch gears and learn about the concept of "tidy" data format by starting with a motivating example. Let's consider the `drinks` data frame included in the `fivethirtyeight` data. Run the following: ```{r} drinks ``` -After reading the help file by running `?drinks` we see that is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries originally reported on the data journalism website FiveThirtyEight.com's article ["Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?"](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/). +After reading the help file by running `?drinks`, we see that `drinks` is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi's article ["Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?"](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/) + +Let's apply some of the data wrangling verbs we learned in Chapter \@ref(wrangling) on the `drinks` data frame. Let's + +1. `filter()` the `drinks` data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then +1. `select()` all columns except `total_litres_of_pure_alcohol` by using `-` sign, then +1. `rename()` the variables `beer_servings`, `spirit_servings`, and `wine_servings` to `beer`, `spirit`, and `wine` respectively -Let's filter `drinks` to only consider 4 countries: the US, China, Italy, and Saudi Arabia; drop the column `total_litres_of_pure_alcohol` by using `select()` with a `-` sign; and rename the variables `beer_servings`, `spirit_servings`, and `wine_servings` to read `beer`, `spirit`, and `wine`. +and save the resulting data frame in `drinks_smaller`. ```{r} drinks_smaller <- drinks %>% @@ -134,7 +139,7 @@ drinks_smaller <- drinks %>% drinks_smaller ``` -Using `drinks_smaller`, how would we create the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller); recall we saw barplots displaying two categorical variables in Section \@ref(two-categ-barplot). +Using the `drinks_smaller` data frame, how would we create the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller)? Recall we saw barplots displaying two categorical variables in Section \@ref(two-categ-barplot). ```{r drinks-smaller, fig.cap="Alcohol consumption in 4 countries.", fig.height=3.5, echo=FALSE} drinks_smaller_tidy <- drinks_smaller %>% @@ -146,20 +151,27 @@ ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) + Let's break down the Grammar of Graphics: -1. The categorical variable `country` with four levels (China, Italy, Saudi Arabia, USA) is mapped to the `x`-position of the bars. -1. The numerical variable `servings` is mapped to the `y`-position of the bars, in other words the height. -1. The cateogircal variable `type` with three levels (beer, spirit, wine) is mapped to the `fill` color of the bars. +1. The categorical variable `country` with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the `x`-position of the bars. +1. The numerical variable `servings` would have to be mapped to the `y`-position of the bars, in other words the height of the bars. +1. The categorical variable `type` with three levels (beer, spirit, wine) who have to be mapped to the `fill` color of the bars. -Observe however that `drinks_smaller` has *three separate columns* for `beer`, `spirit`, and `wine`, whereas in order to recreate the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller) we would need a *single column* `type` with three possible values: `beer`, `spirit`, and `wine`. In other words, for us to be able to create this barplot, our data frame would have to look like: +Observe however that `drinks_smaller` has *three separate variables* for `beer`, `spirit`, and `wine`, whereas in order to recreate the side-by-side AKA dodged barplot in Figure \@ref(fig:drinks-smaller) we would need a *single variable* `type` with three possible values: `beer`, `spirit`, and `wine`, which we would then map to the `fill` aesthetic. In other words, for us to be able to create the barplot in Figure \@ref(fig:drinks-smaller), our data frame would have to look like this: ```{r} drinks_smaller_tidy ``` -Observe that while `drinks_smaller` and `drinks_smaller_tidy` are both rectangular in shape and contain the same data on 4 countries average number of servings for 3 alcohol types, totalling 12 numerical values, they are formatted differently. `drinks_smaller` is formatted in what's known as ["wide"](https://en.wikipedia.org/wiki/Wide_and_narrow_data) format, whereas `drinks_smaller_tidy` is formated in what's known as ["long/narrow"](https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow). "Long/narrow" format is as known in R circles as "tidy" format. +Let's compare the `drinks_smaller_tidy` with the `drinks_smaller` data frame from earlier: +```{r} +drinks_smaller +``` -### What is tidy data? +Observe that while `drinks_smaller` and `drinks_smaller_tidy` are both rectangular in shape and contain the same 12 numerical values (3 alcohol types $\times$ 4 countries), they are formatted differently. `drinks_smaller` is formatted in what's known as ["wide"](https://en.wikipedia.org/wiki/Wide_and_narrow_data) format, whereas `drinks_smaller_tidy` is formatted in what's known as ["long/narrow"](https://en.wikipedia.org/wiki/Wide_and_narrow_data#Narrow). In the context of using R, long/narrow format is also known as "tidy" format. Furthermore, in order to use the `ggplot2` and `dplyr` packages for data visualization and data wrangling, your input data frames *must* be in "tidy" format. So all non-"tidy" data must be converted to "tidy" format first. + +Before we show you how to convert non-"tidy" data frames like `drinks_smaller` to "tidy" data frames like `drinks_smaller_tidy`, let's go over the explicit definition of "tidy" data. + +### Definition of "tidy" data You have surely heard the word "tidy" in your life: @@ -168,7 +180,7 @@ You have surely heard the word "tidy" in your life: * Marie Kondo's best-selling book [_The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing_](https://www.amazon.com/Life-Changing-Magic-Tidying-Decluttering-Organizing/dp/1607747308/ref=sr_1_1?ie=UTF8&qid=1469400636&sr=8-1&keywords=tidying+up) and Netflix TV series [_Tidying Up with Marie Kondo_](https://www.netflix.com/title/80209379). * "I am not by any stretch of the imagination a tidy person, and the piles of unread books on the coffee table and by my bed have a plaintive, pleading quality to me - 'Read me, please!'" - Linda Grant -What does it mean for your data to be "tidy"? While "tidy" has a clear english meaning of "organized", "tidy" in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham's definition of *tidy data* here [@tidy]: +What does it mean for your data to be "tidy"? While "tidy" has a clear English meaning of "organized", "tidy" in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham's definition of *tidy data* here [@tidy]: > A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. @@ -185,13 +197,13 @@ are matched up with observations, variables and types. In *tidy data*: > 2. Each observation forms a row. > 3. Each type of observational unit forms a table. -```{r tidyfig, echo=FALSE, fig.cap="Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html"} +```{r tidyfig, echo=FALSE, fig.cap="Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html)."} knitr::include_graphics("images/tidy-1.png") ``` -For example, say the following table consists of stock prices: +For example, say you have the following table of stock prices in Table \@ref(tab:non-tidy-stocks): -```{r echo=FALSE} +```{r non-tidy-stocks, echo=FALSE} stocks <- data_frame( Date = as.Date('2009-01-01') + 0:4, `Boeing Stock Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""), @@ -209,9 +221,9 @@ stocks %>% latex_options = c("HOLD_position")) ``` -Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats. +Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In "tidy" data format each variable should be its own column, as shown in Table \@ref(tab:tidy-stocks). Notice that both tables present the same information, but in different formats. -```{r echo=FALSE} +```{r tidy-stocks, echo=FALSE} stocks_tidy <- stocks %>% rename( Boeing = `Boeing Stock Price`, @@ -229,9 +241,9 @@ stocks_tidy %>% latex_options = c("HOLD_position")) ``` -However, consider the following table +Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in Table \@ref(tab:tidy-stocks-2). -```{r echo=FALSE} +```{r tidy-stocks-2, echo=FALSE} stocks <- data_frame( Date = as.Date('2009-01-01') + 0:4, `Boeing Price` = paste("$", c("173.55", "172.61", "173.86", "170.77", "174.29"), sep = ""), @@ -248,17 +260,31 @@ stocks %>% latex_options = c("HOLD_position")) ``` -In this case, even though the variable "Boeing Price" occurs again, the data *is* tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day). +In this case, even though the variable "Boeing Price" occurs just like in our non-"tidy" data in Table \@ref(tab:non-tidy-stocks), the data *is* "tidy" since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day. + +```{block, type='learncheck'} +**_Learning check_** +``` + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" data frames? + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" data frames useful for organizing data? + +```{block, type='learncheck', purl=FALSE} +``` -### Converting to "tidy" format -In this book so far, you've only seen data frames that were already in "tidy" format. Furthermore for the rest of this book, you'll only see data frames that are already in "tidy" format. This is not always the case however with data in the wild. If your original data is in wide AKA non-"tidy" format and you would like to use the `ggplot2` or `dplyr` packages on it, you will have to convert it "tidy" format using the `gather()` function in the `tidyr` package [@R-tidyr]. Going back to our `drinks_smaller` data frame +### Converting to "tidy" data + +In this book so far, you've only seen data frames that were already in "tidy" format. Furthermore for the rest of this book, you'll mostly only see data frames that are already in "tidy" format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e. non-"tidy" format and you would like to use the `ggplot2` package for data visualization or the `dplyr` package for data wrangling, you will first have to convert it "tidy" format using the `gather()` function in the `tidyr` package [@R-tidyr]. + +Going back to our `drinks_smaller` data frame from earlier: ```{r} drinks_smaller ``` -let's convert it to "tidy" format by using the `gather()` function from the `tidyr` package: +We convert it to "tidy" format by using the `gather()` function from the `tidyr` package as follows: ```{r} drinks_smaller_tidy <- drinks_smaller %>% @@ -266,13 +292,21 @@ drinks_smaller_tidy <- drinks_smaller %>% drinks_smaller_tidy ``` -We set the +We set the arguments to `gather()` as follows: -1. `key` argument to be the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to gather. Observe we set `key = type` and in the resulting `drinks_smaller_tidy` data frame, the column `type` contains the names `beer`, `spirit`, and `serving`. -1. `value` argument to be the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to gather. Observe we set `value = servings` and in the resulting `drinks_smaller_tidy` data frame, the column `value` contains the 4 $\times$ 3 numerical values. -1. Third argument to be the columns you want to or don't want to gather. Observe we set this to `-country` indicating that we don't want to gather the `country` variable and in the resulting `drinks_smaller_tidy` data frame there is still a variable `country`. +1. `key` is the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to tidy. Observe how we set `key = type` and in the resulting `drinks_smaller_tidy` the column `type` contains the three types of alcohol `beer`, `spirit`, and `wine`. +1. `value` is the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set `value = servings` and in the resulting `drinks_smaller_tidy` the column `value` contains the 4 $\times$ 3 = 12 numerical values. +1. The third argument are the columns you either want to or don't want to tidy. Observe how we set this to `-country` indicating that we don't want to tidy the `country` variable in `drinks_smaller` and rather only `beer`, `spirit`, and `wine`. -With the resulting `drinks_smaller_tidy` "tidy" format data frame, we can now produce a side-by-side AKA dodged barplot using `geom_col()` and not `geom_bar()`, since we would like to map the `servings` variable to the `y`-aesthetic of the bars. +The third argument is a little nuanced, so let's consider another example. Note the code below is very similar, but now the third argument species which columns we'd want to tidy `c(beer, spirit, wine)`, instead of the columns we don't want to tidy `-country`. Note the use of `c()` to create a vector of the columns in `drinks_smaller` that we'd like to tidy. If you run the code below, you'll see that the resulting `drinks_smaller_tidy` is the same. + +```{r, eval=FALSE} +drinks_smaller_tidy <- drinks_smaller %>% + gather(key = type, value = servings, c(beer, spirit, wine)) +drinks_smaller_tidy +``` + +With our `drinks_smaller_tidy` "tidy" format data frame, we can now produce a side-by-side AKA dodged barplot using `geom_col()` and not `geom_bar()`, since we would like to map the `servings` variable to the `y`-aesthetic of the bars. ```{r} ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) + @@ -286,36 +320,34 @@ Converting "wide" format data to "tidy" format often confuses new R users. The o **_Learning check_** ``` -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article [Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/) +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Take a look the `airline_safety` data frame included in the `fivethirtyeight` data. Run the following: -```{r echo=FALSE} -drinks_sub <- drinks %>% - select(-total_litres_of_pure_alcohol) %>% - filter(country %in% c("USA", "Canada", "South Korea")) -drinks_sub_tidy <- drinks_sub %>% - gather(type, servings, -c(country)) %>% - mutate( - type = str_sub(type, start=1, end=-10) - ) %>% - arrange(country, type) %>% - rename(`alcohol type` = type) -drinks_sub +```{r, eval=FALSE} +airline_safety ``` -This data frame is not in tidy format. What would it look like if it were? +After reading the help file by running `?airline_safety`, we see that `airline_safety` is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article ["Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?"](https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/). Let's ignore the `incl_reg_subsidiaries` and `avail_seat_km_per_week` variables for simplicity: + +```{r} +airline_safety_smaller <- airline_safety %>% + select(-c(incl_reg_subsidiaries, avail_seat_km_per_week)) +airline_safety_smaller +``` + +This data frame is not in "tidy" format. How would you convert this data frame to be in "tidy" format, in particular so that it has a variable `incident_type_years` indicating the incident type/year and a variable `count` of the counts? ```{block, type='learncheck', purl=FALSE} ``` ---- +*** ### `nycflights13` package -Recall the `nycflights13` package with data about all domestic flights departing from New York City in 2013 that we introduced in Section \@ref(nycflights13) and used extensively in Chapter \@ref(viz) to create visualizations. In particular, let's revisit the `flights` data frame by running `View(flights)` in your console. We see that `flights` has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data: +Recall the `nycflights13` package with data about all domestic flights departing from New York City in 2013 that we introduced in Section \@ref(nycflights13) and used extensively in Chapter \@ref(viz) on data visualization and Chapter \@ref(wrangling) on data wrangling. Let's revisit the `flights` data frame by running `View(flights)`. We saw that `flights` has a rectangular shape with each of its `r scales::comma(nrow(flights))` rows corresponding to a flight and each of its `r ncol(flights)` columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of "tidy" data from above. 1. Each variable forms a column. 2. Each observation forms a row. @@ -324,56 +356,27 @@ But what about the third property of "tidy" data? > 3. Each type of observational unit forms a table. -**Observational units**: - -We identified earlier that the observational unit in the `flights` dataset is an individual flight. And we have shown that this dataset consists of `r scales::comma(nrow(flights))` flights with `r ncol(flights)` variables. In other words, rows of this dataset don't refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013. - -Also included in the `nycflights13` package are datasets with different observational units [@R-nycflights13]: - -* `airlines`: translation between two letter IATA carrier codes and names (`r nrow(nycflights13::airlines)` in total) -* `planes`: construction information about each of `r scales::comma(nrow(nycflights13::planes))` planes used -* `weather`: hourly meteorological data (about `r nycflights13::weather %>% count(origin) %>% .[["n"]] %>% mean() %>% round()` observations) for each of the three NYC airports -* `airports`: airport names and locations - -The organization of this data follows the third "tidy" data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question. - -**Identification vs measurement variables**: - -There is a subtle difference between the kinds of variables that you will encounter in data frames: *measurement variables* and *identification variables*. The `airports` data frame you worked with above contains both these types of variables. Recall that in `airports` the observational unit is an airport, and thus each row corresponds to one particular airport. Let's pull them apart using the `glimpse` function: - -```{r} -glimpse(airports) -``` - -The variables `faa` and `name` are what we will call *identification variables*: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. `faa` gives the unique code provided by the FAA for that airport, while the `name` variable gives the longer more natural name of the airport. The remaining variables (`lat`, `lon`, `alt`, `tz`, `dst`, `tzone`) are often called *measurement* or *characteristic* variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, `lat` and `long` describe the latitude and longitude of each airport. - -So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday. - -Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame. +Recall that we also saw in Section \@ref(exploredataframes) that the observational unit for the `flights` data frame is an individual flight. In other words, the rows of the `flights` data frame refer to characteristics/measurements of individual flights. Also included in the `nycflights13` package are other data frames with their rows representing different observational units [@R-nycflights13]: -```{block lc3-3c, type='learncheck'} -**_Learning check_** -``` - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame? Note that you may want to use `?airports` to get more information. +* `airlines`: translation between two letter IATA carrier codes and names (`r nrow(nycflights13::airlines)` in total). i.e. the observational unit is an airline company. +* `planes`: construction information about each of `r scales::comma(nrow(nycflights13::planes))` planes used. i.e. the observational unit is an aircraft. +* `weather`: hourly meteorological data (about `r nycflights13::weather %>% count(origin) %>% .[["n"]] %>% mean() %>% round()` observations) for each of the three NYC airports. i.e. the observational unit is an hourly measurement. +* `airports`: airport names and locations. i.e. the observational unit is an airport. -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. - -```{block, type='learncheck', purl=FALSE} -``` +The organization of the information into these five data frames follow the third "tidy" data property: observations corresponding to the same observational unit should be saved in the same table i.e. data frame. You could think of this property as the old English expression: "birds of a feather flock together." - - ---- +*** ## Case study: Democracy in Guatemala {#case-study-tidy} -In this section, we'll show you another example of how to convert a dataset that isn't in "tidy" format i.e. "wide" format, to a dataset that is in "tidy" format i.e. "long/narrow" format using the `gather()` function from the `tidyr` package.. Let's use the `dem_score` data frame we imported in Section \@ref(csv), but focus on only data corresponding to the country of Guatemala. +In this section, we'll show you another example of how to convert a data frame that isn't in "tidy" format i.e. "wide" format, to a data frame that is in "tidy" format i.e. "long/narrow" format. We'll do this using the `gather()` function from the `tidyr` package again. Furthermore, we'll make use of some of the `ggplot2` data visualization and `dplyr` data wrangling tools you learned in Chapters \@ref(viz) and \@ref(wrangling). + +Let's use the `dem_score` data frame we imported in Section \@ref(csv), but focus on only data corresponding to Guatemala. ```{r} guat_dem <- dem_score %>% @@ -381,51 +384,49 @@ guat_dem <- dem_score %>% guat_dem ``` -Now let's produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let's start by laying out how we would map our aesthetics to variables in the data frame: - -- The `data` frame is `guat_dem` by setting `data = guat_dem` - -What are the names of the variables to plot? We'd like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named `country` but its only value is `"Guatemala"`. We have other variables denoted by different year values. Unfortunately, we've run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and `ggplot2`. Remember that `ggplot2` is a package in the `tidyverse` and, thus, needs data to be in a tidy format. We'd like to finish off our mapping of aesthetics to variables by doing something like +Now let's produce a *time-series plot* showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section \@ref(linegraphs) on creating linegraphs using `geom_line()`. Let's lay out the Grammar of Graphics we saw in Section \@ref(grammarofgraphics). -- The `aes`thetic mapping is set by `aes(x = year, y = democracy_score)` +First we know we need to set `data = guat_dem` and use a `geom_line()` layer, but what is the aesthetic mapping of variables. We'd like to see how the democracy score has changed over the years, so we need to map: -but this is not possible with our wide-formatted data. We need to take the values of the current column names in `guat_dem` (aside from `country`) and convert them into a new variable that will act as a key called `year`. Then, we'd like to take the numbers on the inside of the table and turn them into a column that will act as values called `democracy_score`. Our resulting data frame will have three columns: `country`, `year`, and `democracy_score`. +* `year` to the x-position aesthetic and +* `democracy_score` to the y-position aesthetic -The `gather()` function in the `tidyr` package can complete this task for us. The first argument to `gather()`, just as with `ggplot2()`, is the `data` argument where we specify which data frame we would like to tidy. The next two arguments to `gather()` are `key` and `value`, which specify what we'd like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we'd like to NOT include in this tidying process using a `-`. +Now we are stuck in a predicament, much like with our `drinks_smaller` example in Section \@ref(tidy-data-ex). We see that we have a variable named `country`, but its only value is `"Guatemala"`. We have other variables denoted by different year values. Unfortunately, the `guat_dem` data frame is not "tidy" and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the `ggplot2` package. We need to take the values of the columns corresponding to years in `guat_dem` and convert them into a new "key" variable called `year`. Furthermore, we'd like to take the democracy scores on the inside of the table and turn them into a new "value" variable called `democracy_score`. Our resulting data frame will thus have three columns: `country`, `year`, and `democracy_score`. - - - +Recall that the `gather()` function in the `tidyr` package can complete this task for us: ```{r} -guat_tidy <- guat_dem %>% +guat_dem_tidy <- guat_dem %>% gather(key = year, value = democracy_score, -country) -guat_tidy +guat_dem_tidy ``` -We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and `ggplot2`. +We set the arguments to `gather()` as follows: -```{r errors=TRUE} -ggplot(guat_tidy, aes(x = year, y = democracy_score)) + - geom_line() -``` +1. `key` is the name of the column/variable in the new "tidy" frame that contains the column names of the original data frame that you want to tidy. Observe how we set `key = year` and in the resulting `guat_dem_tidy` the column `year` contains the years where the Guatemala's democracy score were measured. +1. `value` is the name of the column/variable in the "tidy" frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set `value = democracy_score` and in the resulting `guat_dem_tidy` the column `democracy_score` contains the 1 $\times$ 9 = 9 democracy scores. +1. The third argument are the columns you either want to or don't want to tidy. Observe how we set this to `-country` indicating that we don't want to tidy the `country` variable in `guat_dem` and rather only `1952` through `1992`. - + -Observe that the `year` variable in `guat_tidy` is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in `guat_dem`. This is leading to `ggplot` not knowing exactly how to plot a line using a categorical variable. We can fix this by using the `parse_number()` function in the `readr` package and then specify the horizontal axis label to be `"year"`: +However, observe in the output for `guat_dem_tidy` that the `year` variable is of type `chr` or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the `as.numeric()` function within the `mutate()` function, which we saw in Section \@ref(mutate) on mutating existing variables to create new ones. -```{r guatline, fig.cap="Guatemala's democracy score ratings from 1952 to 1992"} -ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) + - geom_line() + - labs(x = "year") +```{r} +guat_dem_tidy <- guat_dem_tidy %>% + mutate(year = as.numeric(year)) ``` -We'll see in Chapter \@ref(wrangling) how we could use the `mutate()` function to change `year` to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure \@ref(fig:guatline): +We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a `geom_line()`: + +```{r errors=TRUE} +ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) + + geom_line() + + labs(x = "Year", y = "Democracy Score", title = "Democracy score in Guatemala 1952-1992") +``` -- The `data` frame is `guat_tidy` by setting `data = dem_score` -- The `x` `aes`thetic is mapped to `year` -- The `y` `aes`thetic is mapped to `democracy_score` -- The `geom_`etry chosen is `line` ```{block lc-tidying, type='learncheck', purl=FALSE} **_Learning check_** @@ -441,7 +442,7 @@ a tidy data frame and assign the name of `dem_score_tidy` to the resulting long- ---- +*** @@ -449,7 +450,7 @@ a tidy data frame and assign the name of `dem_score_tidy` to the resulting long- ### `tidyverse` package -Notice at the beginning of the Chapter we loaded the following four packages: +Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science: ```{r, eval=FALSE} library(dplyr) @@ -458,7 +459,7 @@ library(readr) library(tidyr) ``` -In fact, these are among the four of the most frequently used R packages for data science. There is a much quicker way to load these packages than by individually loading them as we did above. We can install and load the `tidyverse` package. The `tidyverse` package acts as an "umbrella" package whereby installing/loading it will install/load multiple packages at once for you. So that after installing the `tidyverse` package as you would a normal package, running this: +There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the `tidyverse` package. The `tidyverse` package acts as an "umbrella" package whereby installing/loading it will install/load multiple packages at once for you. So after installing the `tidyverse` package as you would a normal package, running this: ```{r, eval=FALSE} library(tidyverse) @@ -479,44 +480,10 @@ library(forcats) You've seen the first 4 of the these packages: `ggplot2` for data visualization, `dplyr` for data wrangling, `tidyr` for converting data to "tidy" format, and `readr` for importing spreadsheet data into R. The remaining packages (`purrr`, `tibble`, `stringr`, and `forcats`) are left for a more advanced book; check out [R for Data Science](http://r4ds.had.co.nz/) to learn about these packages. -The `tidyverse` "umbrella" package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in "tidy" format and all output data frames are in "tidy" format as well. This acts as a standardization to make transitions between the various functions in these packages as seamless as possible. +The `tidyverse` "umbrella" package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in "tidy" format and all output data frames are in "tidy" format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible. - - -### Optional: Normal forms of data - -The datasets included in the `nycflights13` package are in a form that minimizes redundancy of data. We will see that there are ways to _merge_ (or _join_) the different tables together easily. We are capable of doing so because each of the tables have _keys_ in common to relate one to another. This is an important property of **normal forms** of data. The process of decomposing data frames into less redundant tables without losing information is called **normalization**. More information is available on [Wikipedia](https://en.wikipedia.org/wiki/Database_normalization). - -We saw an example of this above with the `airlines` dataset. While the `flights` data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier. - -Below an example is given showing how to **join** the `airlines` data frame together with the `flights` data frame by linking together the two datasets via a common **key** of `"carrier"`. Note that this "joined" data frame is assigned to a new data frame called `joined_flights`. The **key** variable that we frequently join by is one of the *identification variables* mentioned above. - -```{r message=FALSE} -joined_flights <- inner_join(x = flights, y = airlines, by = "carrier") -``` - -```{r eval=FALSE} -View(joined_flights) -``` - -If we `View()` this dataset, we see a new variable has been created called `name`. (We will see in Subsection \@ref(rename) ways to change `name` to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter \@ref(wrangling). We will see there that the names of the columns to be linked need not match as they did here with `"carrier"`. - -```{block tidy_review, type='learncheck'} -**_Learning check_** -``` - - **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets? - - **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data? - - **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms? What are some disadvantages? - -```{block, type='learncheck', purl=FALSE} -``` - - ### Additional resources An R script file of all R code used in this chapter is available [here](scripts/05-tidy.R). @@ -538,7 +505,7 @@ Review questions have been designed using the `fivethirtyeight` R package [@R-fi ### What's to come? -Congratulations! We've completed the "Data Science via the tidyverse" portion of this book! We'll now move to the "data modeling" portion in Chapters \@ref(regression) and \@ref(multiple-regression), where you'll leverage your data visualization and wrangling skills to model relationships between different variables in datasets. However, we're going to leave the Chapter \@ref(inference-for-regression) on "Inference for Regression" until after we've covered statistical inference. +Congratulations! We've completed the "Data Science via the tidyverse" portion of this book! We'll now move to the "data modeling" portion in Chapters \@ref(regression) and \@ref(multiple-regression), where you'll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we're going to leave the Chapter \@ref(inference-for-regression) on "Inference for Regression" until after we've covered statistical inference. ```{r echo=FALSE, fig.cap="ModernDive flowchart - On to Part II!", fig.align='center'} knitr::include_graphics("images/flowcharts/flowchart/flowchart.005.png") diff --git a/06-regression.Rmd b/06-regression.Rmd index 3078a6a86..5dd311860 100755 --- a/06-regression.Rmd +++ b/06-regression.Rmd @@ -10,30 +10,21 @@ rq <- 0 # **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`** knitr::opts_chunk$set( - tidy = FALSE, - out.width = "\\textwidth", - message = FALSE, + tidy = FALSE, + out.width = '\\textwidth', + fig.height = 4, + fig.align='center', warning = FALSE - ) +) + options(scipen = 99, digits = 3) -# This bit of code is a bug fix on asis blocks, which we use to show/not show LC -# solutions, which are written like markdown text. In theory, it shouldn't be -# necessary for knitr versions <=1.11.6, but I've found I still need to for -# everything to knit properly in asis blocks. More info here: -# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr -library(knitr) -knit_engines$set(asis = function(options) { - if (options$echo && options$eval) knit_child(text = options$code) -}) +# In knitr::kable printing replace all NA's with blanks +options(knitr.kable.NA = '') -# This controls which LC solutions to show. Options for solutions_shown: "ALL" -# (to show all solutions), or subsets of c('5-1', '5-2','5-3', '5-4'), including -# the null vector c("") to show no solutions. -solutions_shown <- c("") -show_solutions <- function(section){ - return(solutions_shown == "ALL" | section %in% solutions_shown) - } +# Set random number generator see value for replicable pseudorandomness. Why 76? +# https://www.youtube.com/watch?v=xjJ7FheCkCU +set.seed(76) ``` @@ -82,7 +73,6 @@ library(gapminder) library(skimr) ``` - ```{r, message=FALSE, warning=FALSE, echo=FALSE} library(ggplot2) library(dplyr) @@ -104,16 +94,8 @@ library(kableExtra) ``` -### DataCamp {-} - -The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 2 "Modeling with Basic Regression". -```{r, echo=FALSE, results='asis', purl=FALSE} -image_link(path = "images/datacamp_working_with_data.png", - link = "https://www.datacamp.com/courses/working-with-data-in-the-tidyverse", - html_opts = "height: 150px;", - latex_opts = "width=0.3\\textwidth") -``` +*** @@ -553,6 +535,12 @@ Just as we did for the 21st instructor in the `evals_ch6` dataset (in the first More development of this idea appears in Section \@ref(leastsquares) and we encourage you to read that section after you investigate residuals. + + +*** + + + ## One categorical explanatory variable {#model2} It's an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we'll explore differences in life expectancy in two ways: @@ -639,7 +627,7 @@ ggplot(gapminder2007, aes(x = lifeExp)) + title = "Worldwide life expectancy") ``` -We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let's proceed by comparing median and mean life expectancy between continents by adding a `group_by(continent)` to the above code: +We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let's proceed by comparing median and mean life expectancy between continents by adding a `group_by(continent)` to the above code: ```{r, eval=TRUE} lifeExp_by_continent <- gapminder2007 %>% @@ -665,9 +653,9 @@ n_countries <- gapminder2007 %>% nrow() n_countries_africa <- gapminder2007 %>% filter(continent == "Africa") %>% nrow() ``` -We see now that there are differences in life expectancies between the continents. For example let's focus on only medians. While the median life expectancy across all $n = `r n_countries`$ countries in 2007 was `r lifeExp_worldwide$median %>% round(3)`, the median life expectancy across the $n =`r n_countries_africa`$ countries in Africa was only `r median_africa`. +We see now that there are differences in life expectancy between the continents. For example let's focus on only medians. While the median life expectancy across all $n = `r n_countries`$ countries in 2007 was `r lifeExp_worldwide$median %>% round(3)`, the median life expectancy across the $n =`r n_countries_africa`$ countries in Africa was only `r median_africa`. -Let's create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section \@ref(facets), that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure \@ref(fig:catxplot0b), the variable we facet by is `continent`, which is categorical with five levels, each corresponding to the five continents of the world. +Let's create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section \@ref(facets), that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure \@ref(fig:catxplot0b), the variable we facet by is `continent`, which is categorical with five levels, each corresponding to the five continents of the world. ```{r catxplot0b, warning=FALSE, fig.cap="Life expectancy in 2007"} ggplot(gapminder2007, aes(x = lifeExp)) + @@ -677,7 +665,7 @@ ggplot(gapminder2007, aes(x = lifeExp)) + facet_wrap(~ continent, nrow = 2) ``` -Another way would be via a `geom_boxplot` where we map the categorical variable `continent` to the $x$-axis and the different life expectancies within each continent on the $y$-axis; we do this in Figure \@ref(fig:catxplot1). +Another way would be via a `geom_boxplot` where we map the categorical variable `continent` to the $x$-axis and the different life expectancy within each continent on the $y$-axis; we do this in Figure \@ref(fig:catxplot1). ```{r catxplot1, warning=FALSE, fig.cap="Life expectancy in 2007"} ggplot(gapminder2007, aes(x = continent, y = lifeExp)) + @@ -693,7 +681,7 @@ It’s important to remember however that the solid lines in the middle of the b * Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes). * Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand. -Now, let's start making comparisons of life expectancy *between* continents. Let's use Africa as a *baseline for comparsion*. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the "eyeball test" (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: +Now, let's start making comparisons of life expectancy *between* continents. Let's use Africa as a *baseline for comparison*. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the "eyeball test" (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa: 1. The median life expectancy of the Americas is roughly 20 years greater. 1. The median life expectancy of Asia is roughly 20 years greater. @@ -811,7 +799,7 @@ Now let's interpret the terms in the estimate column of the regression table. Fi i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table \@ref(tab:continent-mean-life-expectancies). -Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: +Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is: \begin{align} @@ -827,7 +815,7 @@ Next, $b_{\text{Amer}}$ = `continentAmericas = 18.8` is the difference in mean l i.e. in this case, only the indicator function $\mathbb{1}_{\mbox{Amer}}(x)$ is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table \@ref(tab:continent-mean-life-expectancies). -Similarly, $b_{\text{Asia}}$ = `continentAsia = 15.9` is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: +Similarly, $b_{\text{Asia}}$ = `continentAsia = 15.9` is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is: \begin{align} @@ -918,6 +906,11 @@ $-26.9 = 43.8 - 70.7$ is Afghanistan's mean life expectancy minus the mean life expectancy of all Asian countries. + +*** + + + ## Related topics ### Correlation coefficient {#correlationcoefficient} @@ -1167,12 +1160,17 @@ In this case, it outputs only variables of interest to us as new regression mode If you're even more curious, take a look at the source code for these functions on [GitHub](https://github.com/moderndive/moderndive/blob/master/R/regression_functions.R). -## Conclusion -In this chapter, you've seen what we call "basic regression" when you only have one explanatory variable. In Chapter \@ref(multiple-regression), we'll study *multiple regression* where we have more than one explanatory variable! In particular, we'll see why we've been conducting the residual analyses from Subsections \@ref(model1residuals) and \@ref(model2residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `lower_ci` and `upper_ci` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don't worry for now if you don't understand what these terms mean. After the next chapter on multiple regression, we'll dive in! +*** -### Script of R code + +## Conclusion + +### Additional resources An R script file of all R code used in this chapter is available [here](scripts/06-regression.R). +### What's to come? + +In this chapter, you've seen what we call "basic regression" when you only have one explanatory variable. In Chapter \@ref(multiple-regression), we'll study *multiple regression* where we have more than one explanatory variable! In particular, we'll see why we've been conducting the residual analyses from Subsections \@ref(model1residuals) and \@ref(model2residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `lower_ci` and `upper_ci` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don't worry for now if you don't understand what these terms mean. After the next chapter on multiple regression, we'll dive in! diff --git a/07-multiple-regression.Rmd b/07-multiple-regression.Rmd index 6f9241794..b307c1e23 100644 --- a/07-multiple-regression.Rmd +++ b/07-multiple-regression.Rmd @@ -1,4 +1,3 @@ - # Multiple Regression {#multiple-regression} ```{r, include=FALSE, purl=FALSE} @@ -9,30 +8,21 @@ rq <- 0 # **`r paste0("(RQ", chap, ".", (rq <- rq + 1), ")")`** knitr::opts_chunk$set( - tidy = FALSE, - out.width = "\\textwidth", - message = FALSE, + tidy = FALSE, + out.width = '\\textwidth', + fig.height = 4, + fig.align='center', warning = FALSE - ) +) + options(scipen = 99, digits = 3) -# This bit of code is a bug fix on asis blocks, which we use to show/not show LC -# solutions, which are written like markdown text. In theory, it shouldn't be -# necessary for knitr versions <=1.11.6, but I've found I still need to for -# everything to knit properly in asis blocks. More info here: -# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr -library(knitr) -knit_engines$set(asis = function(options) { - if (options$echo && options$eval) knit_child(text = options$code) -}) - -# This controls which LC solutions to show. Options for solutions_shown: "ALL" -# (to show all solutions), or subsets of c('5-1', '5-2','5-3', '5-4'), including -# the null vector c("") to show no solutions. -solutions_shown <- c("") -show_solutions <- function(section){ - return(solutions_shown == "ALL" | section %in% solutions_shown) - } +# In knitr::kable printing replace all NA's with blanks +options(knitr.kable.NA = '') + +# Set random number generator see value for replicable pseudorandomness. Why 76? +# https://www.youtube.com/watch?v=xjJ7FheCkCU +set.seed(76) ``` In Chapter \@ref(regression) we introduced ideas related to modeling, in particular that the fundamental premise of modeling is *to make explicit the relationship* between an outcome variable $y$ and an explanatory/predictor variable $x$. Recall further the synonyms that we used to also denote $y$ as the dependent variable and $x$ as an independent variable or covariate. @@ -62,7 +52,6 @@ library(ISLR) # library(skimr) (Causes problems with table linking) ``` - ```{r, message=FALSE, warning=FALSE, echo=FALSE} # Packages needed internally, but not in text: library(mvtnorm) @@ -74,13 +63,8 @@ library(patchwork) ``` -### DataCamp {-} -The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 3 "Modeling with Multiple Regression." - -```{r, echo=FALSE, results='asis'} -image_link(path = "images/datacamp_working_with_data.png", link = "https://www.datacamp.com/courses/working-with-data-in-the-tidyverse", html_opts = "height: 150px;", latex_opts = "width=0.3\\textwidth") -``` +*** @@ -422,6 +406,11 @@ Recall the format of the output: * `residual` corresponds to $y - \widehat{y}$ (the residual) + +*** + + + ## One numerical & one categorical explanatory variable {#model4} Let's revisit the instructor evaluation data introduced in Section \@ref(model1), where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between `bty_avg` and `score`, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let's say instead of `bty_avg` we are interested in the numerical explanatory variable $x_1$ `age` and furthermore we want to use a second explanatory variable $x_2$, the (binary) categorical variable `gender`. @@ -526,7 +515,6 @@ get_regression_table(score_model_2) %>% The modeling equation for this scenario is: - \begin{align} \widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\ \widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) @@ -668,6 +656,10 @@ Recall the format of the output: +*** + + + ## Related topics ### More on the correlation coefficient {#correlationcoefficient2} @@ -788,17 +780,23 @@ ggplot(Credit, aes(x = Income, y = Balance)) + --> + +*** + + + ## Conclusion -### What's to come? +### Additional resources + +An R script file of all R code used in this chapter is available [here](scripts/07-multiple-regression.R). -Congratulations! We're ready to proceed to the third portion of this book: "statistical inference" using a new package called `infer`. Once we've covered Chapters \@ref(sampling) on sampling, \@ref(confidence-intervals) on confidence intervals, and \@ref(hypothesis-testing) on hypothesis testing, we'll come back to the models we've seen in "data modeling" in Chapter \@ref(inference-for-regression) on inference for regression. As we said at the end of Chapter \@ref(regression), we'll see why we've been conducting the residual analyses from Subsections \@ref(model3residuals) and \@ref(model4residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `conf_low` and `conf_high` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. +### What's to come? -Up next: +Congratulations! We're ready to proceed to the third portion of this book: "statistical inference" using a new package called `infer`. Once we've covered Chapters \@ref(sampling) on sampling, \@ref(confidence-intervals) on confidence intervals, and \@ref(hypothesis-testing) on hypothesis testing, we'll come back to the models we've seen in "data modeling" in Chapter \@ref(inference-for-regression) on inference for regression. As we said at the end of Chapter \@ref(regression), we'll see why we've been conducting the residual analyses from Subsections \@ref(model3residuals) and \@ref(model4residuals). We are actually verifying some very important assumptions that must be met for the `std_error` (standard error), `p_value`, `conf_low` and `conf_high` (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Up next:
-### Script of R code -An R script file of all R code used in this chapter is available [here](scripts/07-multiple-regression.R). + diff --git a/08-sampling.Rmd b/08-sampling.Rmd index 8e0c94c39..ea45da4e1 100644 --- a/08-sampling.Rmd +++ b/08-sampling.Rmd @@ -1,4 +1,4 @@ -# (PART) Inference via infer {-} +# (PART) Statistical inference via infer {-} # Sampling {#sampling} @@ -26,6 +26,7 @@ set.seed(76) In this chapter we kick off the third segment of this book, statistical inference, by learning about **sampling**. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we'll cover in Chapters \@ref(confidence-intervals) and \@ref(hypothesis-testing) respectively. We will see that the tools that you learned in the data science segment of this book, in particular data visualization and data wrangling, will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to "think with data." + ### Needed packages {-} Let's load all the packages needed for this chapter (this assumes you've already installed them). If needed, read Section \@ref(packages) for information on how to install and load R packages. @@ -42,21 +43,24 @@ library(knitr) library(kableExtra) library(patchwork) library(readr) +library(stringr) ``` ---- +*** ## Sampling activity {#sampling-activity} -Let's start with a hand-on activity. +Let's start with a hands-on activity. ### What proportion of this bowl's balls are red? -Take a look at the bowl in Figure \@ref(fig:sampling-exercise-1). It has a certain number of red and and a certain number of white balls, all of equal size. What proportion of this bowl's balls are red? +Take a look at the bowl in Figure \@ref(fig:sampling-exercise-1). It has a certain number of red and a certain number of white balls all of equal size. Furthermore, it appears the bowl has been mixed beforehand as there does not seem to be any particular pattern to the spatial distribution of red and white balls. + +Let's now ask ourselves, what proportion of this bowl's balls are red? ```{r sampling-exercise-1, echo=FALSE, fig.cap="A bowl with red and white balls.", purl=FALSE, out.width = "80%"} knitr::include_graphics("images/sampling_bowl_1.jpg") @@ -64,7 +68,7 @@ knitr::include_graphics("images/sampling_bowl_1.jpg") One way to answer this question would be to perform an exhaustive count: remove each ball individually, count the number of red balls and the number of white balls, and divide the number of red balls by the total number of balls. However this would be a long and tedious process. -### Using shovel once +### Using the shovel once Instead of performing an exhaustive count, let's insert a shovel into the bowl as seen in Figure \@ref(fig:sampling-exercise-2). @@ -78,23 +82,27 @@ Using the shovel we remove a number of balls as seen in Figure \@ref(fig:samplin knitr::include_graphics("images/sampling_bowl_3_cropped.jpg") ``` -Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel's balls are red. The proportion of balls that are red in this shovel is a guess of the proportion of balls that are red in the entire bowl. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. +Observe that 17 of the balls are red and there are a total of 5 x 10 = 50 balls and thus 0.34 = 34% of the shovel's balls are red. We can view the proportion of balls that are red *in this shovel* as a guess of the proportion of balls that are red *in the entire bowl*. While not as exact as doing an exhaustive count, our guess of 34% took much less time and energy to obtain. -However say we started this activity over from the beginning. In other words, we replace the 50 balls back into the ball and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl's balls that are red by exactly 34% again? Maybe? +However, say, we started this activity over from the beginning. In other words, we replace the 50 balls back into the bowl and start over. Would we remove exactly 17 red balls again? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34% again? Maybe? -What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl's balls that are red by exactly 34% every time? Surely not. Let's actually do and observe the results with the help of 33 of our friends. +What if we repeated this exercise several times? Would I obtain exactly 17 red balls each time? In other words, would our guess at the proportion of the bowl's balls that are red be exactly 34% every time? Surely not. Let's actually do and observe the results with the help of 33 of our friends. -### Using shovel 33 times {#student-shovels} +### Using the shovel 33 times {#student-shovels} -Each of our 33 friends will do the following: use the shovel to remove 50 balls each, count the number of red balls, use this number to compute the proportion of the 50 balls they removed that are red, return the balls into the bowl, and mix the contents of the bowl a little to not let a previous group;s results influence the next group's set of results. +Each of our 33 friends will do the following: -```{r sampling-exercise-3b, echo=FALSE, fig.cap="Repeating sampling activity 33 times.", purl=FALSE, out.width = "20%"} +- use the shovel to remove 50 balls each, +- count the number of red balls, +- use this number to compute the proportion of the 50 balls they removed that are red, +- return the balls into the bowl, and +- mix the contents of the bowl a little to not let a previous group's results influence the next group's set of results. + +```{r sampling-exercise-3b, echo=FALSE, fig.show='hold', fig.cap="Repeating sampling activity 33 times.", purl=FALSE, out.width = "20%"} # # Need new picture # -knitr::include_graphics("images/sampling/tactile_2_a.jpg") -knitr::include_graphics("images/sampling/tactile_2_b.jpg") -knitr::include_graphics("images/sampling/tactile_2_c.jpg") +knitr::include_graphics(c("images/sampling/tactile_2_a.jpg", "images/sampling/tactile_2_b.jpg", "images/sampling/tactile_2_c.jpg")) ``` However, before returning the balls into the bowl, they are going to mark the proportion of the 50 balls they removed that are red in a histogram as seen in Figure \@ref(fig:sampling-exercise-4). @@ -113,10 +121,10 @@ Observe the following about the histogram in Figure \@ref(fig:sampling-exercise- * At the low end, one group removed 50 balls from the bowl with proportion between 0.20 = 20% and 0.25 = 25% * At the high end, another group removed 50 balls from the bowl with proportion between 0.45 = 45% and 0.5 = 50% red. -* However the most frequently occuring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution. +* However the most frequently occurring proportions were between 0.30 = 30% and 0.35 = 35% red, right in the middle of the distribution. * The shape of this distribution is somewhat bell-shaped. -Let's construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter \@ref(viz). We saved our 33 groups of friend's proportion red in a data frame `tactile_prop_red` which is included in the `moderndive` package you loaded earlier. +Let's construct this same hand-drawn histogram in R using your data visualization skills that you honed in Chapter \@ref(viz). We saved our 33 group of friends' proportion red in a data frame `tactile_prop_red` which is included in the `moderndive` package you loaded earlier. ```{r, eval=FALSE} tactile_prop_red @@ -138,9 +146,9 @@ tactile_prop_red %>% latex_options = c("HOLD_position", "repeat_header")) ``` -Observe for each `group` we have their names, the number of `red_balls` they obtained, and the corresponding proportion out of 50 balls that were red `prop_red`. Observe, we also have a variable `replicate` enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. +Observe for each `group` we have their names, the number of `red_balls` they obtained, and the corresponding proportion out of 50 balls that were red named `prop_red`. Observe, we also have a variable `replicate` enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red. -We visualize the distribution of these 33 proportions using a `geom_histogram()` with `binwidth = 0.05` in Figure \@ref(fig:samplingdistribution-tactile), which matches our hand-drawn histogram from the earlier Figure \@ref(fig:sampling-exercise-5). Recall that using a histogram is appropriate since `prop_red` is a numerical variable. +We visualize the distribution of these 33 proportions using a `geom_histogram()` with `binwidth = 0.05` in Figure \@ref(fig:samplingdistribution-tactile), which is appropriate since the variable `prop_red` is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure \@ref(fig:sampling-exercise-5). ```{r eval=FALSE} ggplot(tactile_prop_red, aes(x = prop_red)) + @@ -156,56 +164,61 @@ tactile_histogram + title = "Distribution of 33 proportions red") ``` + ### What are we doing here? -What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl's balls that are red. However, because the bowl has a very large number of balls, performing an exhaustive count of the number of red and white balls in the bowl would be very costly, both in terms of both time and energy. We therefore instead mix the balls and extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we approximate the proportion of the bowl's balls that are red using the proportion of the shovel's balls that are red, 17 red balls out of 50 balls = 34% in our earlier example. - -Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table \@ref(tab:tactilered). This is known as the concept of *sampling variation*. +What we just demonstrated in this activity is the statistical concept of *sampling*. We would like to know the proportion of the bowl's balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl's balls that are red using the proportion of the shovel's balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table \@ref(tab:tactilered). This is known as the concept of *sampling variation*. -In Section \@ref(sampling-simulation) we'll mimic the hands-on sampling activity we just performed in a *computer simulation*; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the activity a very large number of times, but we will also be able to repeat it with different sized shovels. +In Section \@ref(sampling-simulation) we'll mimic the hands-on sampling activity we just performed in a *computer simulation*; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels. -After these simulations, in Section \@ref(sampling-goal) we'll explicitly articulate our goals for this chapter: understanding the concept of sampling variation and the role that sample size plays in this variation. +The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we'll present you with definitions, terminology, and notation related to sampling in Section \@ref(sampling-framework). As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you'll be able to master these topics. -After having armed ourselves with this conceptual understanding of sampling, we'll present you with definitions, terminology, and notation related to sampling in Section \@ref(sampling-framework). As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you'll be able to master these topics. - -To tie the contents of this chapter to the real-word, we'll present an example of one of the most recognizable uses of sampling: polls. In Section \@ref(sampling-case-study) we'll look at a particular case study: a 2013 poll on then President Obama's popularity amongst young Americans, conducted by the Harvard Kennedy School's Institute of Politics. - -We'll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distiguishing between *random sampling* and *random assignment*, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter. +To tie the contents of this chapter to the real-word, we'll present an example of one of the most recognizable uses of sampling: polls. In Section \@ref(sampling-case-study) we'll look at a particular case study: a 2013 poll on then President Obama's popularity among young Americans, conducted by the Harvard Kennedy School's Institute of Politics. +We'll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between *random sampling* and *random assignment*, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter. + ---- +*** ## Computer simulation {#sampling-simulation} -What we performed in Section \@ref(sampling-activity) is a *simulation* of sampling. The crowd-sourced Wikipedia definition of a simulation states: "A simulation is an approximate imitation of the operation of a process or system."^[[Wikipedia entry for simulation](https://en.wikipedia.org/wiki/Simulation)] One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. +What we performed in Section \@ref(sampling-activity) is a *simulation* of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: "A simulation is an approximate imitation of the operation of a process or system."^[[Wikipedia entry for simulation](https://en.wikipedia.org/wiki/Simulation)] One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible. + +Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we'll term a simulation performed in real-life as a "tactile" simulation done with your hands and to the touch as opposed to a "virtual" simulation performed on a computer. -Now you might be thinking that simulations must necssarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengeres of being in an automobile crash. To distinguish between these two simulation types, we'll term a simulation performed in real-life as a "tactile" simulation done with your hands and to the touch as opposed to a "virtual" simulation performed on a computer. + Example of a "tactile" simulation | Example of "virtual" simulation :-------------------------:|:-------------------------: ![](images/crash-test-dummy.jpg){ height=1.7in } | ![](images/flight-simulator.jpg){ height=1.7in } -So while in Section \@ref(sampling-activity) we performed a "tactile" simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we'll perform a "virtual" simulation using a virtual bowl and a virtual shovel with our computers. +So while in Section \@ref(sampling-activity) we performed a "tactile" simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we'll perform a "virtual" simulation using a "virtual" bowl and a "virtual" shovel with our computers. -### Using shovel once +### Using the virtual shovel once -Let's start by perfoming the virtual analogue of the tactile sampling simulation we performed in \@ref(sampling-activity). We first need a virtual analogue of the bowl seen in Figure \@ref(fig:sampling-exercise-1). To this end, we created a data frame called `bowl` whose rows correspond exactly with the contents of the actual bowl; we've included this data frame in the `moderndive` package. +Let's start by performing the virtual analogue of the tactile sampling simulation we performed in \@ref(sampling-activity). We first need a virtual analogue of the bowl seen in Figure \@ref(fig:sampling-exercise-1). To this end, we included a data frame `bowl` in the `moderndive` package whose rows correspond exactly with the contents of the actual bowl. ```{r} bowl ``` -Observe in the output that `bowl` has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable `ball_ID` is used merely as an "identification variable" for this data frame as discussed in Subsection \@ref(identification-vs-measurement); none of the balls in the actual bowl are marked with numbers. The second variable `color` indicates whether a particular virtual ball i s red or white. Run `View(bowl)` in RStudio and scroll through the contents to convince yourselves that `bowl` is indeed a virtual version of the actual bowl in Figure \@ref(fig:sampling-exercise-1). + -Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure \@ref(fig:sampling-exercise-2) to generate our random samples of 50 balls. We're going to use the `rep_sample_n()` function included in the `moderndive` package that allows us to take `rep`eated/`rep`licated `samples of size `n`. Run the following and explore `virtual_shovel`'s contents in the spreadsheet viewer. +Observe in the output that `bowl` has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable `ball_ID` is used merely as an "identification variable" for this data frame as discussed in Subsection \@ref(identification-vs-measurement-variables); none of the balls in the actual bowl are marked with numbers. The second variable `color` indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio's data viewer and scroll through the contents to convince yourselves that `bowl` is indeed a virtual version of the actual bowl in Figure \@ref(fig:sampling-exercise-1). + +Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure \@ref(fig:sampling-exercise-2); we'll use this virtual shovel to generate our virtual random samples of 50 balls. We're going to use the `rep_sample_n()` function included in the `moderndive` package. This function allows us to take `rep`eated, or `rep`licated, `samples` of size `n`. Run the following and explore `virtual_shovel`'s contents in the RStudio viewer. ```{r, eval=FALSE} virtual_shovel <- bowl %>% @@ -230,24 +243,24 @@ virtual_shovel %>% latex_options = c("HOLD_position")) ``` -The `ball_ID` variable identifies which of balls from `bowl` are included in our sample of 50 balls and `color` denotes it's color. However what does the `replicate` variable indicate? In `virtual_shovel`'s case, `replicate` is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in other words our first sample. We'll see below when we "virtually" take 33 samples below, `replicate` will take values between 1 and 33. Before we do this, let's compute the proportion of balls in our virtual sample of size 50 that are red. We'll be using the `dplyr` data wrangling verbs you learned in Chapter \@ref(wrangling). Let's breakdown the steps individually: +The `ball_ID` variable identifies which of the balls from `bowl` are included in our sample of 50 balls and `color` denotes its color. However what does the `replicate` variable indicate? In `virtual_shovel`'s case, `replicate` is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We'll see below when we "virtually" take 33 samples, `replicate` will take values between 1 and 33. Before we do this, let's compute the proportion of balls in our virtual sample of size 50 that are red using the `dplyr` data wrangling verbs you learned in Chapter \@ref(wrangling). Let's breakdown the steps individually: -First, for each of our 50 sampled balls, identify if it is red or not using the boolean algebra. For every row where `color == "red"`, the boolean `TRUE` is returned and for every row where `color` is not equal to `"red"`, the boolean `FALSE` is returned. Let's create a new boolean variable `is_red` using the `mutate()` function from Section \@ref(mutate): +First, for each of our 50 sampled balls, identify if it is red using a test for equality using `==`. For every row where `color == "red"`, the Boolean `TRUE` is returned and for every row where `color` is not equal to `"red"`, the Boolean `FALSE` is returned. Let's create a new Boolean variable `is_red` using the `mutate()` function from Section \@ref(mutate): ```{r} virtual_shovel %>% - mutate(is_red = color == "red") + mutate(is_red = (color == "red")) ``` Second, we compute the number of balls out of 50 that are red using the `summarize()` function. Recall from Section \@ref(summarize) that `summarize()` takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like `mean()` and `median()`. In this case we use the `sum()`: ```{r} virtual_shovel %>% - mutate(is_red = color == "red") %>% + mutate(is_red = (color == "red")) %>% summarize(num_red = sum(is_red)) ``` -Why does this work? Because R treats `TRUE` like the number `1` and `FALSE` like the number `0`. So summing the number of `TRUE`'s and `FALSE`'s is equivalent to summing `1`'s and `0`'s, which in the end which counts the number of balls where `color` is `red`. +Why does this work? Because R treats `TRUE` like the number `1` and `FALSE` like the number `0`. So summing the number of `TRUE`'s and `FALSE`'s is equivalent to summing `1`'s and `0`'s, which in the end counts the number of balls where `color` is `red`. In our case, 17 of the 50 balls were red. Third and last, we compute the proportion of the 50 sampled balls that are red by dividing `num_red` by 50: @@ -258,7 +271,7 @@ virtual_shovel %>% mutate(prop_red = num_red / 50) ``` -Let's make the above code a little more compact and succinct by combining the first `mutate()` and the `summarize()` as follows: +In other words, this "virtual" sample's balls were 34% red. Let's make the above code a little more compact and succinct by combining the first `mutate()` and the `summarize()` as follows: ```{r} virtual_shovel %>% @@ -266,16 +279,12 @@ virtual_shovel %>% mutate(prop_red = num_red / 50) ``` -Great! 44% of `virtual_shovel`'s 50 balls were red! So based on this particular sample, our guess at the proportion of `bowl`'s balls that are red is 44%. But remember from our earlier tactile sampling activity, that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 44% of them being red; there will likely be some variation. - -In fact in Table \@ref(tab:virtual-shovel) we displayed 33 such proportions based on 33 tactile samples and then in Figure \@ref(fig:sampling-exercise-5) we visualized the distribution of the 33 proportions in a histogram. Let's now perform the virtual analogue of having 33 groups of students use the sampling shovel! - +Great! 34% of `virtual_shovel`'s 50 balls were red! So based on this particular sample, our guess at the proportion of the `bowl`'s balls that are red is 34%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34% of them being red again; there will likely be some variation. In fact in Table \@ref(tab:virtual-shovel) we displayed 33 such proportions based on 33 tactile samples and then in Figure \@ref(fig:sampling-exercise-5) we visualized the distribution of the 33 proportions in a histogram. Let's now perform the virtual analogue of having 33 groups of students use the sampling shovel! -### Using shovel 33 times -Recall however in our tactile sampling exercise in Section \@ref(sampling-activity) above that we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we used to then compute 33 proportions. In other words we *repeated/replicated* the sampling activity 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel funciton `rep_sample_n()`, but by adding the `reps = 33` argument indicating we want to repeat the sampling 33 times. +### Using the virtual shovel 33 times -Be sure to scroll through the contents of `virtual_samples` in RStudio's spreadsheet viewer. +Recall that in our tactile sampling exercise in Section \@ref(sampling-activity) we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function `rep_sample_n()`, but by adding the `reps = 33` argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of `virtual_samples` in RStudio's viewer. ```{r, eval=FALSE} virtual_samples <- bowl %>% @@ -287,9 +296,9 @@ virtual_samples <- bowl %>% rep_sample_n(size = 50, reps = 33) ``` -Observe that while the first 50 rows of `replicate` are equal to `1` the next 50 are equal to `2`. This is indicating that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all `reps = 33` replicates and thus `virtual_samples` has 33 $\times$ 50 = 1650 rows. +Observe that while the first 50 rows of `replicate` are equal to `1`, the next 50 rows of `replicate` are equal to `2`. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all `reps = 33` replicates and thus `virtual_samples` has 33 $\times$ 50 = 1650 rows. -Let's now take the data frame `virtual_samples` with 33 $\times$ 50 = 1650 rows corresponding to 33 samples of size 50 and compute the resulting 33 proportions red. We'll use the same `dplyr` verbs as we did in the previous section, but this time with an additional `group_by()` the `replicate` variable. Recall from Section \@ref(groupby) that by assigning grouping "meta-data" before `summarizing()`, we'll obtain 33 different proportions red: +Let's now take the data frame `virtual_samples` with 33 $\times$ 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We'll use the same `dplyr` verbs as we did in the previous section, but this time with an additional `group_by()` of the `replicate` variable. Recall from Section \@ref(groupby) that by assigning the grouping variable "meta-data" before `summarizing()`, we'll obtain 33 different proportions red: ```{r, eval=FALSE} virtual_prop_red <- virtual_samples %>% @@ -299,7 +308,9 @@ virtual_prop_red <- virtual_samples %>% View(virtual_prop_red) ``` -Let's display only the first 10 out of 33 rows of `virtual_prop_red`'s contents in Table \@ref(tab:tactilered). +Let's display only the first 10 out of 33 rows of `virtual_prop_red`'s contents in Table \@ref(tab:tactilered). As one would expect, there is variation in the resulting `prop_red` proportions red for the first 10 out 33 repeated/replicated samples. + + ```{r virtualred, echo=FALSE} virtual_prop_red <- virtual_samples %>% @@ -338,11 +349,11 @@ virtual_histogram + title = "Distribution of 33 proportions red") ``` -Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. +Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation. -Let's now compare our virtual results with our tactile results from the previous section in Figure \@ref(fig:tactile-vs-virtual). We see that both histograms, in other words the distribution of the 33 proportions red, are *somewhat* somewhat similar in their center and spread, although not identical; these slight differences are again due to random variation. Furthermore both distributions are *somewhat* bell-shaped. +Let's now compare our virtual results with our tactile results from the previous section in Figure \@ref(fig:tactile-vs-virtual). We see that both histograms, in other words the distribution of the 33 proportions red, are *somewhat* similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are *somewhat* bell-shaped. -```{r tactile-vs-virtual, echo=FALSE, fig.cap="Two distribution of 33 proportions based on 33 samples of size 50"} +```{r tactile-vs-virtual, echo=FALSE, fig.cap="Comparing 33 virtual and 33 tactile proportions red."} bind_rows( virtual_prop_red %>% mutate(type = "Virtual sampling"), @@ -359,11 +370,9 @@ bind_rows( ``` -### Using shovel 1000 times - -Now say we want study the variation in proportions red not based on 33 samples but rather a very large number of samples, say 1000 samples. We have two choices at this point. We could make our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. However, this would be cruel and unusual, as it this would be very tedious and time consuming. This is however where computers excel: for automating long and repetitive tasks and having them performed very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let's once again use the `rep_sample_n()` function with sample `size` set to 50, but the number of replicates `reps = 1000`. +### Using the virtual shovel 1000 times -Be sure to scroll through the contents of `virtual_samples` in RStudio's spreadsheet viewer. +Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let's once again use the `rep_sample_n()` function with sample `size` set to 50 once again, but this time with the number of replicates `reps = 1000`. Be sure to scroll through the contents of `virtual_samples` in RStudio's viewer. ```{r, eval=FALSE} virtual_samples <- bowl %>% @@ -376,7 +385,7 @@ virtual_samples <- bowl %>% ``` -Observe that now `virtual_samples` has 1000 $\times$ 50 = 50,000 rows, instead of the 33 $\times$ 50 = 1650 rows from earlier. Using the same code as earlier, let's take the data frame `virtual_samples` with 1000 $\times$ 50 = 50,000 and compute the resulting 33 proportions red. +Observe that now `virtual_samples` has 1000 $\times$ 50 = 50,000 rows, instead of the 33 $\times$ 50 = 1650 rows from earlier. Using the same code as earlier, let's take the data frame `virtual_samples` with 1000 $\times$ 50 = 50,000 and compute the resulting 1000 proportions red. ```{r, eval=FALSE} virtual_prop_red <- virtual_samples %>% @@ -388,6 +397,8 @@ View(virtual_prop_red) Observe that we now have 1000 replicates of `prop_red`, the proportion of 50 balls that are red. Using the same code as earlier, let's now visualize the distribution of these 1000 replicates of `prop_red` in a histogram in Figure \@ref(fig:samplingdistribution-virtual-1000). + + ```{r eval=FALSE} ggplot(virtual_prop_red, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") + @@ -406,18 +417,20 @@ virtual_histogram + title = "Distribution of 1000 proportions red") ``` -Once again, the most frequently occuring proportions red occur between 35% and 40%. Every now and then, we'd obtain proportions are low as between 20% and 25%, and others as high as between 55% and 60%, but those are rarities. Furthermore observe that we now have much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix \@ref(appendixA) for a brief discussion on properties of the Normal distribution. +Once again, the most frequently occurring proportions red occur between 35% and 40%. Every now and then, we obtain proportions as low as between 20% and 25%, and others as high as between 55% and 60%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix \@ref(appendixA) for a brief discussion on properties of the Normal distribution. + + ### Using different shovels -We ask ourselves a question now. Say you had three choices of shovels to extract a sample of balls and compute the corresponding proportion of balls in the shovel that are red: +Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with. A shovel with 25 slots | A shovel with 50 slots | A shovel with 100 slots :-------------------------:|:-------------------------:|:-------------------------: ![](images/sampling/shovel_025.jpg){ height=1.7in } | ![](images/sampling/shovel_050.jpg){ height=1.7in } | ![](images/sampling/shovel_100.jpg){ height=1.7in } -Which would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size, and thus would yield the "best" guess of the proportion of the bowl's 2400 balls that are red. The three shovels above present with three possible sample sizes. Using our newly developed tools for virtual sampling simulations, let's unpack the effect of having different sample sizes! In other words, for `size = 25`, `size = 50`, and `size = 100`: +If your goal was still to estimate the proportion of the bowl's balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the "best" guess of the proportion of the bowl's 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let's unpack the effect of having different sample sizes! In other words, let's use `rep_sample_n()` with `size = 25`, `size = 50`, and `size = 100`, while keeping the number of repeated/replicated samples at 1000: 1. Virtually use the appropriate shovel to generate 1000 samples with `size` balls. 1. Compute the resulting 1000 replicated of the proportion of the shovel's balls that are red. @@ -505,17 +518,18 @@ virtual_prop_red_100 <- virtual_samples_100 %>% mutate(prop_red = red / 100) %>% mutate(n = 100) -virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50,virtual_prop_red_100) +virtual_prop <- bind_rows(virtual_prop_red_25, virtual_prop_red_50, virtual_prop_red_100) -ggplot(virtual_prop, aes(x = prop_red)) + +comparing_sampling_distributions <- ggplot(virtual_prop, aes(x = prop_red)) + geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") + - labs(x = "Sample proportion red", title = "Comparing the distributions of proportion red for different sample sizes") + + labs(x = "Proportion of shovel's balls that are red", title = "Comparing distributions of proportions red for 3 different shovels.") + facet_wrap(~n) +comparing_sampling_distributions ``` -Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation, and the distribution centers more tightly around the same value. Eyeballing Figure \@ref(fig:comparing-sampling-distributions), things appear to center more tightly around roughly 40%. +Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure \@ref(fig:comparing-sampling-distributions), things appear to center tightly around roughly 40%. -We can be numerically explicit about the amount of spread using the *standard deviation*: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix \@ref(appendixA) for a brief discussion on properties of the standard deviation. For all three sample sizes, compute the standard deviation of `sd()` of the 1000 proportions red by running the following data wrangling code. +We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of `prop_red` using the *standard deviation*: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix \@ref(appendixA) for a brief discussion on properties of the standard deviation. For all three sample sizes, let's compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the `sd()` summary function. ```{r, eval = FALSE} # n = 25 @@ -531,16 +545,18 @@ virtual_prop_red_100 %>% summarize(sd = sd(prop_red)) ``` -Let's compare these 3 measures of spread of the distributions we in Table \@ref(tab:comparing-n). +Let's compare these 3 measures of spread of the distributions in Table \@ref(tab:comparing-n). ```{r comparing-n, eval=TRUE, echo=FALSE} -virtual_prop %>% +comparing_n_table <- virtual_prop %>% group_by(n) %>% summarize(sd = sd(prop_red)) %>% - rename(`sample size` = n, `standard deviation` = sd) %>% + rename(`Number of slots in shovel` = n, `Standard deviation of proportions red` = sd) + +comparing_n_table %>% kable( digits = 3, - caption = "Comparing the standard deviations of the proportion red for different sample sizes.", + caption = "Comparing standard deviations of proportions red for 3 different shovels.", booktabs = TRUE ) %>% kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), @@ -550,280 +566,266 @@ virtual_prop %>% As we observed visually in Figure \@ref(fig:comparing-sampling-distributions), as the sample size increases our numerical measure of spread decreases; there is less variation in our proportions red. In other words, as the sample size increases, our guesses at the true proportion of the bowl's balls that are red get more consistent and precise. ---- -## Our goal {#sampling-goal} +*** -Simply put: study the effects of sampling variation -### What is sampling variation? - -### Effect of sample size +## Sampling framework {#sampling-framework} +In both our "hands-on" tactile simulations and our "virtual" simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl's balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure \@ref(fig:comparing-sampling-distributions) and Table \@ref(tab:comparing-n), comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation: ---- +1. The effect of sampling variation on our estimates. +1. The effect of sample size on sampling variation. +Let's now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections \@ref(sampling-activity) and \@ref(sampling-simulation); it will simply take time and practice to master them. -## Sampling framework {#sampling-framework} -### Terminology - -Let's now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we'll soon see, they are very powerful concepts that open up a whole new world of scientific thinking: - -1. **Population**: The population is a set of $N$ observations of interest. - + Above Ex: Our bowl consisting of $N=2400$ identically-shaped balls. -1. **Population parameter**: A population parameter is a numerical summary value about the population. In most settings, this is a value that's unknown and you wish you knew it. - + Above Ex: The true *population proportion $p$* of the balls in the bowl that are red. - + In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc. -1. **Census**: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter's numerical value *exactly*. - + Above Ex: This corresponds to manually going over all $N=2400$ balls and counting the number that are red, thereby allowing us to compute the population proportion $p$ of the balls that are red exactly. - + When $N$ is small, a census is feasible. However, when $N$ is large, a census can get very expensive, either in terms of time, energy, or money. - + Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure. -1. **Sampling**: Collecting a sample of size $n$ of observations from the population. Typically the sample size $n$ is much smaller than the population size $N$, thereby making sampling a much cheaper procedure than a census. - + Above Ex: Using the shovel to extract a sample of $n=50$ balls. - + It is important to remember that the lowercase $n$ corresponds to the sample size and uppercase $N$ corresponds to the population size, thus $n \leq N$. -1. **Point estimates/sample statistics**: A summary statistic based on the sample of size $n$ that *estimates* the unknown population parameter. - + Above Ex: it's the *sample proportion $\widehat{p}$* red of the balls in the sample of size $n=50$. - + Key: The sample proportion red $\widehat{p}$ is an *estimate* of the true unknown population proportion red $p$. -1. **Representative sampling**: A sample is said be a *representative sample* if it "looks like the population." In other words, the sample's characteristics are a good representation of the population's characteristics. - + Above Ex: Does our sample of $n=50$ balls "look like" the contents of the larger set of $N=2400$ balls in the bowl? -1. **Generalizability**: We say a sample is *generalizable* if any results of based on the sample can generalize to the population. - + Above Ex: Is $\widehat{p}$ a "good guess" of $p$? - + In other words, can we *infer* about the true proportion of the balls in the bowl that are red, based on the results of our sample of $n=50$ balls? -1. **Bias**: In a statistical sense, we say *bias* occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is *unbiased* if every observation in a population had an equal chance of being sampled. - + Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn't any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls. -1. **Random sampling**: We say a sampling procedure is *random* if we sample randomly from the population in an unbiased fashion. - + Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size $n=50$ balls would be random. - -### Sampling for inference - -Why did we go through the trouble of enumerating all the above concepts and terminology? - -**The moral of the story**: +### Terminology & notation -> * If the sampling of a sample of size $n$ is done at **random**, then -> * The sample is **unbiased** and **representative** of the population, thus -> * Any result based on the sample can **generalize** to the population, thus -> * The **point estimate/sample statistic** is a "good guess" of the unknown population parameter of interest +Here is a list of terminology and mathematical notation relating to sampling. For each item, we'll be sure to tie them to our simulations in Sections \@ref(sampling-activity) and \@ref(sampling-simulation). -**and thus we have inferred about the population based on our sample. In the above example**: +1. **(Study) Population**: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population's size using upper case $N$. In our simulations the (study) population was the collection of $N$ = 2400 identically sized red and white balls contained in the bowl. +1. **Population parameter**: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the *population mean* which is mathematically denoted with the Greek letter $\mu$ (pronounced "mu"). In our simulations however since we were interested in the proportion of the bowl's balls that were red, the population parameter is the *population proportion* which is mathematically denoted with the letter $p$. +1. **Census**: An exhaustive enumeration or counting of all $N$ individuals or observations in the population in order to compute the population parameter's value *exactly*. In our simulations, this would correspond to manually going over all $N$ = 2400 balls in the bowl and counting the number that are red and computing the population proportion $p$ of the balls that are red *exactly*. When the number $N$ of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money. +1. **Sampling**: Sampling is the act of collecting a sample from the population when we don't have the means to perform a census. We mathematically denote the sample's size using lower case $n$, as opposed to upper case $N$ which denotes the population's size. Typically the sample size $n$ is much smaller than the population size $N$, thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size $n$ = 25, $n$ = 50, and $n$ = 100 balls. +1. **Point estimate (AKA sample statistic)**: A summary statistic computed from the sample that *estimates* the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with $p$. Our point estimate is the *sample proportion*: the proportion of the shovel's balls that are red. In other words, it is our guess of the proportion of the bowl's balls balls that are red. We mathematically denote the sample proportion using $\widehat{p}$; the "hat" on top of the $p$ indicates that it is an estimate of the unknown population proportion $p$. +1. **Representative sampling**: A sample is said be a *representative sample* if it is representative of the population. In other words, are the sample's characteristics a good representation of the population's characteristics? In our simulations, are the samples of $n$ balls extracted using our shovels representative of the bowl's $N$=2400 balls? +1. **Generalizability**: We say a sample is *generalizable* if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is $\widehat{p}$ a "good guess" of $p$? +1. **Bias**: In a statistical sense, we say *bias* occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is *unbiased* if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased. +1. **Random sampling**: We say a sampling procedure is *random* if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel. -> * If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size $n=50$, then -> * The contents of the shovel will "look like" the contents of the bowl, thus -> * Any results based on the sample of $n=50$ balls can generalize to the large bowl of $N=2400$ balls, thus -> * The sample proportion $\widehat{p}$ of the $n=50$ balls in the shovel that are red is a "good guess" of the true population proportion $p$ of the $N=2400$ balls that are red. +Phew, that's a lot of new terminology and notation to learn! Let's put them all together to describe the paradigm of sampling: -**and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.** +> * If the sampling of a sample of size $n$ is done at **random**, then +> * the sample is **unbiased** and **representative** of the population of size $N$, thus +> * any result based on the sample can **generalize** to the population, thus +> * the point estimate is a **"good guess"** of the unknown population parameter, thus +> * instead of performing a census, we can **infer** about the population using sampling. -### Statistical definitions +Restricting consideration to a shovel with 50 slots from our simulations, -Sampling distributions are a specific kind of distribution: distributions of *point estimates/sample statistics* based on samples of size $n$ used to estimate an unknown *population parameter*. +> * If we extract a sample of $n=50$ balls at **random**, in other words we mix the equally-sized balls before using the shovel, then +> * the contents of the shovel are an **unbiased representation** of the contents of the bowl's 2400 balls, thus +> * any result based on the sample of balls can **generalize** to the bowl, thus +> * the sample proportion $\widehat{p}$ of the $n=50$ balls in the shovel that are red is a **"good guess"** of the population proportion $p$ of the $N$=2400 balls that are red, thus +> * instead of manually going over all the balls in the bowl, we can **infer** about the bowl using the shovel. -In the case of the histogram in Figure \@ref(fig:samplingdistribution-tactile), its the distribution of the sample proportion red $\widehat{p}$ based on $n=50$ sampled balls from the bowl, for which we want to estimate the unknown *population proportion* $p$ of the $N=2400$ balls that are red. Sampling distributions describe how values of the sample proportion red $\widehat{p}$ will vary from sample to sample due to **sampling variability** and thus identify "typical" and "atypical" values of $\widehat{p}$. For example +Note that last word we wrote in bold: **infer**. The act of "inferring" is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl's balls that are red. *Statistical inference* is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter \@ref(confidence-intervals) on confidence intervals, we'll introduce the `infer` package, which makes statistical inference "tidy" and transparent. It is why this third portion of the book is called "Statistical inference via infer". -* Obtaining a sample that yields $\widehat{p} = 0.36$ would be considered typical, common, and plausible since it would in theory occur frequently. -* Obtaining a sample that yields $\widehat{p} = 0.8$ would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution. +### Statistical definitions -Let's now ask ourselves the following questions: +Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size $n$ = 25, $n$ = 50, and $n$ = 100 in Section \@ref(sampling-simulation), let's display Figure \@ref(fig:comparing-sampling-distributions) again below. -1. Where is the sampling distribution centered? -1. What is the spread of this sampling distribution? +```{r echo=FALSE} +comparing_sampling_distributions +``` -Recall from Section \@ref(summarize) the mean and the standard deviation are two summary statistics that would answer this question: +These types of distributions have a special name: **sampling distributions**; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion $\widehat{p}$. Using these sampling distributions, for a given sample size $n$, we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size $n$ increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40%. We quantified this spread/variation using the standard deviation of our proportions in Table \@ref(tab:comparing-n), which we display again below: -```{r, eval=FALSE} -tactile_prop_red %>% - summarize(mean = mean(prop_red), sd = sd(prop_red)) -``` -```{r, echo=FALSE} -summary_stats <- tactile_prop_red %>% - summarize(mean = mean(prop_red), sd = sd(prop_red)) -summary_stats %>% - kable(digits = 3) %>% - kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), - latex_options = c("HOLD_position")) +```{r, eval=TRUE, echo=FALSE} +comparing_n_table %>% + kable(digits = 3) ``` -Finally, it's important to keep in mind: +So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: **standard errors**; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel's balls that are red to vary from random sample to random sample. -1. If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red $p$, or in other words the true number of balls out of 2400 that are red. -1. The spread of this histogram, as quantified by the standard deviation of `r summary_stats %>% pull(sd) %>% round(3)`, is called the **standard error**. It quantifies the uncertainty of our estimates of $p$, which recall are called $\widehat{p}$. - + **Note**: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors. +Unfortunately, many new statistics practitioners get confused by these names. For example, it's common for people new to statistical inference to call the "sampling distribution" the "sample distribution". Another additional source of confusion is the name "standard deviation" and "standard error". Remember that a standard error is merely a *kind* of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error. +To help reinforce these concepts, let's re-display Figure \@ref(fig:comparing-sampling-distributions) but using our new terminology, notation, and definitions relating to sampling in Figure \@ref(fig:comparing-sampling-distributions-2). -* sampling distribution -* standard error +```{r comparing-sampling-distributions-2, echo=FALSE, fig.cap="Three sampling distributions of the sample proportion $\\widehat{p}$."} +virtual_prop %>% + mutate( + n = str_c("n = ", n), + n = factor(n, levels = c("n = 25", "n = 50", "n = 100")) + ) %>% + ggplot( aes(x = prop_red)) + + geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") + + labs(x = expression(paste("Sample proportion ", hat(p))), + title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) + + facet_wrap(~n) +``` - +Furthermore, let's re-display Table \@ref(tab:comparing-n) but using our new terminology, notation, and definitions relating to sampling in Table \@ref(tab:comparing-n-2). +```{r comparing-n-2, eval=TRUE, echo=FALSE} +comparing_n_table <- virtual_prop %>% + group_by(n) %>% + summarize(sd = sd(prop_red)) %>% + mutate( + n = str_c("n = ", n), + n = factor(n, levels = c("n = 25", "n = 50", "n = 100")) + ) %>% + rename(`Sample size` = n, `Standard error of $\\widehat{p}$` = sd) +comparing_n_table %>% + kable( + digits = 3, + caption = "Three standard errors of the sample proportion $\\widehat{p}$ based on n = 25, 50, 100. ", + booktabs = TRUE +) %>% + kable_styling(font_size = ifelse(knitr:::is_latex_output(), 10, 16), + latex_options = c("HOLD_position")) +``` -Now let's mimic the above *tactile* sampling, but with *virtual* sampling. We'll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That's where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like! + -In Figure \@ref(fig:samplingdistribution-virtual), we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion $\widehat{p}$ might not be enough to get a true sense of the distribution. Using 1000 values of $\widehat{p}$ would definitely give a better sense. What are our two options for constructing these histograms? +Remember the key message of this last table: that as the sample size $n$ goes up, the "typical" error of your point estimate as quantified by the standard error will go down. -1. Tactile sampling: Make the 33 groups of students take $1000 / 33 \approx 31$ samples of size $n=50$ each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion $\widehat{p}$. However, this would be cruel and unusual as this would take hours! -1. Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go! -First, generate 1000 samples of size $n=50$ +### The moral of the story -```{r, eval=FALSE} -virtual_samples <- bowl %>% - rep_sample_n(size = 50, reps = 1000) -View(virtual_samples) -``` -```{r, echo=FALSE} -virtual_samples <- bowl %>% - rep_sample_n(size = 50, reps = 1000) -``` +Let's recap this section so far. We've seen that if a sample is generated at random, then the resulting point estimate is a "good guess" of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion $\widehat{p}$ of the shovel's balls that were red was a "good guess" of the population proportion $p$ of the bowl's balls that were red. -Then for each of these 1000 samples of size $n=50$, compute the corresponding sample proportions +However, what do we mean by our point estimate being a "good guess"? While sometimes we'll obtain a point estimate less than the true value of the unknown population parameter, other times we'll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will "on average" be correct. In our simulations, sometimes our sample proportion $\widehat{p}$ was less than the true population proportion $p$, other times the sample proportion $\widehat{p}$ was greater than the true population proportion $p$. This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions $\widehat{p}$ were always centered around the true population proportion. This is also known as having an **accurate** estimate. -```{r, eval=FALSE} -virtual_prop_red <- virtual_samples %>% - group_by(replicate) %>% - summarize(red = sum(color == "red")) %>% - mutate(prop_red = red / 50) -View(virtual_prop_red) -```` -```{r, echo=FALSE} -virtual_prop_red <- virtual_samples %>% - group_by(replicate) %>% - summarize(red = sum(color == "red")) %>% - mutate(prop_red = red / 50) +What was the value of the population proportion $p$ of the $N$ = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual `bowl` match the tactile bowl: + +```{r} +bowl %>% + summarize(sum_red = sum(color == "red"), + sum_not_red = sum(color != "red")) ``` -As previously done, let's plot the sampling distribution of these 1000 simulated values of the sample proportion red $\widehat{p}$ with a histogram in Figure \@ref(fig:samplingdistribution-virtual-1000). +Let's re-display our sampling distributions from Figures \@ref(fig:comparing-sampling-distributions) and \@ref(fig:comparing-sampling-distributions-2), but now with a vertical red line marking the true population proportion $p$ of balls that are red = 37.5% in Figure \@ref(fig:comparing-sampling-distributions-3). We see that while there is a certain amount of error in the sample proportions $\widehat{p}$ for all three sampling distributions, on average the $\widehat{p}$ are centered at the true population proportion red $p$. -```{r, eval=FALSE} -ggplot(virtual_prop_red, aes(x = prop_red)) + - geom_histogram(binwidth = 0.05, color = "white") + - labs(x = "Sample proportion red based on n = 50", - title = "Sampling distribution of p-hat") +```{r comparing-sampling-distributions-3, echo=FALSE, fig.cap="Three sampling distributions with population proportion $p$ marked in red."} +p <- bowl %>% + summarize(p = mean(color == "red")) %>% + pull(p) +virtual_prop %>% + mutate( + n = str_c("n = ", n), + n = factor(n, levels = c("n = 25", "n = 50", "n = 100")) + ) %>% + ggplot( aes(x = prop_red)) + + geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") + + labs(x = expression(paste("Sample proportion ", hat(p))), + title = expression(paste("Sampling distributions of the sample proportion ", hat(p), " based on n = 25, 50, 100.")) ) + + facet_wrap(~n) + + geom_vline(xintercept = p, col = "red", size = 1) ``` -```{r echo=FALSE, fig.cap="Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50"} -virtual_prop_red <- virtual_samples %>% - group_by(replicate) %>% - summarize(red = sum(color == "red")) %>% - mutate(prop_red = red / 50) -ggplot(virtual_prop_red, aes(x = prop_red)) + - geom_histogram(binwidth = 0.05, color = "white") + - labs( - x = expression(paste("Sample proportion red ", hat(p), " based on n = 50")), - title = expression(paste("Sampling distribution of ", hat(p))) - ) -``` +We also saw in this section that as your sample size $n$ increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions $\widehat{p}$ around the true population proportion $p$ decreases. You can observe this behavior as well in Figure \@ref(fig:comparing-sampling-distributions-3). This is also known as having a more **precise** estimate. -Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red $p$ of all $N=2400$ balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375. +So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how "on target" our estimates are whereas precision relates to how "consistent" our estimates are. Figure \@ref(fig:accuracy-vs-precision) illustrates the difference. -What is the standard error of the above sampling distribution of $\widehat{p}$ based on 1000 samples of size $n=50$? + -```{r} -virtual_prop_red %>% - summarize(SE = sd(prop_red)) +```{r accuracy-vs-precision, echo=FALSE, fig.cap="Comparing accuracy and precision", purl=FALSE, out.width = "50%"} +knitr::include_graphics("images/accuracy_vs_precision.jpg") ``` -What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It's best to first compare different standard errors for different sampling schemes based on different sample sizes $n$. We'll do so for samples of size $n=25$, $n=50$, and $n=100$ next. +As this point you might be asking yourself: "If you already knew the true proportion of the bowl's balls that are red was 37.5%, then what did we do any of this for?" In other words, "If you already knew the value of the true unknown population parameter, then why did we do any sampling?" You might also be asking: "Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn't we be taking only *one* sample that's as large as possible?" Recall our definition of a simulation from Section \@ref(sampling-simulation): an approximate imitation of the operation of a process or system. We performed these simulations to study: ---- +1. The effect of sampling variation on our estimates. +1. The effect of sample size on sampling variation. +In a real-life scenario, we won't know what the true value of the population parameter is and furthermore we won't take repeated/replicated samples but rather a single sample that's as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling. -## Interpretation {#sampling-intepretation} -At this point, you might be saying to yourself: "Big deal, why do we care about this bowl?" As hopefully you'll soon come to appreciate, this sampling bowl exercise is merely a **simulation** representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world's decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We'll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama's approval ratings among young adults in Section \@ref(polls). - - - ---- +*** ## Case study: Polls {#sampling-case-study} -In December 4, 2013 National Public Radio reported on a recent poll of President Obama's approval rating among young Americans aged 18-29 in an article [Poll: Support For Obama Among Young Americans Eroding](https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding). A quote from the article: +In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama's approval rating among young Americans aged 18-29 in an article [Poll: Support For Obama Among Young Americans Eroding](https://www.npr.org/sections/itsallpolitics/2013/12/04/248793753/poll-support-for-obama-among-young-americans-eroding). A quote from the article: > After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama. > > According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama's job performance, his lowest-ever standing among the group and an 11-point drop from April. -Let's tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations: +Let's tie elements of the real-life poll in this new article with our "tactile" and "virtual" simulations from Sections \@ref(sampling-activity) and \@ref(sampling-simulation) using the terminology, notations, and definitions we learned in Section \@ref(sampling-framework). -1. **Population**: Who is the population of $N$ observations of interest? - + Bowl: $N=2400$ identically-shaped balls - + Obama poll: $N = \text{?}$ young Americans aged 18-29 +1. **(Study) Population**: Who is the population of $N$ individuals or observations of interest? + + Simulation: $N$ = 2400 identically-sized red and white balls + + Obama poll: $N$ = ? young Americans aged 18-29 1. **Population parameter**: What is the population parameter? - + Bowl: The true population proportion $p$ of the balls in the bowl that are red. - + Obama poll: The true population proportion $p$ of young Americans who approve of Obama's job performance. -1. **Census**: What would a census be in this case? - + Bowl: Manually going over all $N=2400$ balls and exactly computing the population proportion $p$ of the balls that are red. - + Obama poll: Locating all $N = \text{?}$ young Americans (which is in the millions) and asking them if they approve of Obama's job performance. This would be quite expensive to do! -1. **Sampling**: How do you acquire the sample of size $n$ observations? - + Bowl: Using the shovel to extract a sample of $n=50$ balls. - + Obama poll: One way would be to get phone records from a database and pick out $n$ phone numbers. In the case of the above poll, the sample was of size $n=2089$ young adults. -1. **Point estimates/sample statistics**: What is the summary statistic based on the sample of size $n$ that *estimates* the unknown population parameter? - + Bowl: The *sample proportion $\widehat{p}$* red of the balls in the sample of size $n=50$. - + Key: The sample proportion red $\widehat{p}$ of young Americans in the sample of size $n=2089$ that approve of Obama's job performance. In this study's case, $\widehat{p} = 0.41$ which is the quoted 41% figure in the article. -1. **Representative sampling**: Is the sample procedure *representative*? In other words, to the resulting samples "look like" the population? - + Bowl: Does our sample of $n=50$ balls "look like" the contents of the larger set of $N=2400$ balls in the bowl? - + Obama poll: Does our sample of $n=2089$ young Americans "look like" the population of all young Americans aged 18-29? + + Simulation: The population proportion $p$ of ALL the balls in the bowl that are red. + + Obama poll: The population proportion $p$ of ALL young Americans who approve of Obama's job performance. +1. **Census**: What would a census look like? + + Simulation: Manually going over all $N$ = 2400 balls and exactly computing the population proportion $p$ of the balls that are red, a time consuming task. + + Obama poll: Locating all $N$ = ? young Americans and asking them all if they approve of Obama's job performance, an expensive task. +1. **Sampling**: How do you collect the sample of size $n$ individuals or observations? + + Simulation: Using a shovel with $n$ slots. + + Obama poll: One method is to get a list of phone numbers of all young Americans and pick out $n$ phone numbers. In this poll's case, the sample size of this poll was $n$ = 2089 young Americans. +1. **Point estimate (AKA sample statistic)**: What is your estimate of the unknown population parameter? + + Simulation: The sample proportion $\widehat{p}$ of the balls in the shovel that were red. + + Obama poll: The sample proportion $\widehat{p}$ of young Americans in the sample that approve of Obama's job performance. In this poll's case, $\widehat{p}$ = 0.41 = 41%, the quoted percentage in the second paragraph of the article. +1. **Representative sampling**: Is the sampling procedure *representative*? + + Simulation: Are the contents of the shovel representative of the contents of the bowl? + + Obama poll: Is the sample of $n$ = 2089 young Americans representative of all young Americans aged 18-29? 1. **Generalizability**: Are the samples *generalizable* to the greater population? - + Bowl: Is $\widehat{p}$ a "good guess" of $p$? - + Obama poll: Is $\widehat{p} = 0.41$ a "good guess" of $p$? In other words, can we confidently say that 41% of *all* young Americans approve of Obama. + + Simulation: Is the sample proportion $\widehat{p}$ of the shovel's balls that are red a "good guess" of the population proportion $p$ of the bowl's balls that are red? + + Obama poll: Is the sample proportion $\widehat{p}$ = 0.41 of the sample of young Americans who support Obama a "good guess" of the population proportion $p$ of all young Americans who support Obama? In other words, can we confidently say that 41% of *all* young Americans approve of Obama? 1. **Bias**: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? - + Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the $n=50$ shovel, and thus no particular color of ball can be favored in our samples over others. - + Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included? + + Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel's sample, and hence the sampling was unbiased. + + Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their *sampling methodology*. 1. **Random sampling**: Was the sampling random? - + Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random? - + Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you'd have to ask the group that conducted the poll: The Harvard University Institute of Politics. + + Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random. + + Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their *sampling methodology*. -Recall the punchline of all the above: +Once again, let's revisit the sampling paradigm: > * If the sampling of a sample of size $n$ is done at **random**, then -> * The sample is **unbiased** and **representative** of the population, thus -> * Any result based on the sample can **generalize** to the population, thus -> * The **point estimate/sample statistic** is a "good guess" of the unknown population parameter of interest +> * the sample is **unbiased** and **representative** of the population of size $N$, thus +> * any result based on the sample can **generalize** to the population, thus +> * the point estimate is a **"good guess"** of the unknown population parameter, thus +> * instead of performing a census, we can **infer** about the population using sampling. -and thus we have *inferred* about the population based on our sample. In the bowl example: +In our simulations using the shovel with 50 slots: -> * If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size $n=50$, then -> * The contents of the shovel will "look like" the contents of the bowl, thus -> * Any results based on the sample of $n=50$ balls can generalize to the large bowl of $N=2400$ balls, thus -> * The sample proportion $\widehat{p}$ of the $n=50$ sampled balls in the shovel that are red is a "good guess" of the true population proportion $p$ of the $N=2400$ balls that are red. +> * If we extract a sample of $n$ = 50 balls at **random**, in other words we mix the equally-sized balls before using the shovel, then +> * the contents of the shovel are an **unbiased representation** of the contents of the bowl's 2400 balls, thus +> * any result based on the sample of balls can **generalize** to the bowl, thus +> * the sample proportion $\widehat{p}$ of the $n$ = 50 balls in the shovel that are red is a **"good guess"** of the population proportion $p$ of the $N$ = 2400 balls that are red, thus +> * instead of manually going over all the balls in the bowl, we can **infer** about the bowl using the shovel. -and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example: +In the in-real life Obama poll: -> * If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then -> * These 2089 young Americans would "look like" the population of all young Americans, thus -> * Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus -> * The reported sample approval rating of 41% of these 2089 young Americans is a "good guess" of the true approval rating amongst *all* young Americans. +> * If we had a way of contacting a **randomly** chosen sample of 2089 young Americans and poll their approval of Obama, then +> * these 2089 young Americans would be an **unbiased** and **representative** sample of *all* young Americans, thus +> * any results based on this sample of 2089 young Americans can **generalize** to the entire population of all young Americans, thus +> * the reported sample approval rating of 41% of these 2089 young Americans is a **good guess** of the true approval rating among all young Americans, thus +> * instead of performing a highly costly census of all young Americans, we can **infer** about all young Americans using polling. -So long story short, this poll's guess of Obama's approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states: + -> The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points. -Note the term *margin of error*, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about $\pm 2.1\%$, in words from about 2.1% too small to about 2.1% too big. These errors are caused by *sampling variation*, the same sampling variation you saw studied in the histograms in Sections \@ref(tactile) on our tactile sampling simulations and Sections \@ref(virtual) on our virtual sampling simulations. -In this case of polls, any variation from the true approval rating is an "error" and a reasonable range of errors is the margin of error. We'll see in the next chapter that this what's known as a 95% confidence interval for the unknown approval rating. We'll study confidence intervals using a new package for our data science and statistical toolbox: the `infer` package for statistical inference. +*** +## Conclusion {#sampling-conclusion} ---- + -## Conclusion {#sampling-conclusion} -### Table of inference scenarios {#sampling-conclusion-table} +### Central Limit Theorem {#sampling-conclusion-central-limit-theorem} + +What you did in Sections \@ref(sampling-activity) and \@ref(sampling-simulation) (in particular in Figure \@ref(fig:comparing-sampling-distributions) and Table \@ref(tab:comparing-n)) was demonstrate a very famous theorem, or mathematically proven truth, called the *Central Limit Theorem*. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at https://www.youtube.com/embed/jvoxEYmQHNM explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy! + +
+ +
+ + +### Summary table {#sampling-conclusion-table} + +In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion $\widehat{p}$ to estimate the population proportion $p$. However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion $p$ and sample proportion $\widehat{p}$ scenarios we studied in this chapter. We present 5 more such scenarios in Table \@ref(tab:summarytable-ch8). + +Note that the sample mean is traditionally noted as $\overline{x}$ but can also be thought of as an estimate of the population mean $\mu$. Thus, it can also be denoted as $\widehat{\mu}$ as shown below in the table. + ```{r summarytable-ch8, echo=FALSE, message=FALSE} # The following Google Doc is published to CSV and loaded below using read_csv() below: # https://docs.google.com/spreadsheets/d/1QkOpnBGqOXGyJjwqx1T2O5G5D72wWGfWlPyufOgtkk4/edit#gid=0 @@ -831,7 +833,7 @@ In this case of polls, any variation from the true approval rating is an "error" "https://docs.google.com/spreadsheets/d/e/2PACX-1vRd6bBgNwM3z-AJ7o4gZOiPAdPfbTp_V15HVHRmOH5Fc9w62yaG-fEKtjNUD2wOSa5IJkrDMaEBjRnA/pub?gid=0&single=true&output=csv" %>% read_csv(na = "") %>% kable( - caption = "\\label{tab:summarytable}Scenarios of sampling for inference", + caption = "\\label{tab:summarytable-ch8}Scenarios of sampling for inference", booktabs = TRUE, escape = FALSE ) %>% @@ -844,50 +846,27 @@ In this case of polls, any variation from the true approval rating is an "error" column_spec(5, width = "1in") ``` -We'll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing: - -* Scenario 2 about means. Ex: the average age of pennies. -* Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of *two-sample* inference. -* Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of *two-sample* inference. - -In Chapter \@ref(inference-for-regression) on inference for regression, we'll cover Scenarios 5 & 6 about the regression line. In particular we'll see that the fitted regression line from Chapter \@ref(regression) on basic regression, $\widehat{y} = b_0 + b_1 \cdot x$, is in fact an estimate of some true population regression line $y = \beta_0 + \beta_1 \cdot x$ based on a sample of $n$ pairs of points $(x, y)$. Ex: Recall our sample of $n=463$ instructors at the UT Austin from the `evals` data set in Chapter \@ref(regression). Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for *all* instructors, not just those at the UT Austin? - -In most cases, we don't have the population values as we did with the `bowl` of balls. We only have a single sample of data from a larger population. We'd like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a **confidence interval** and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as **bootstrapping** that will be the focus of the beginning sections of this chapter. - - -### Random sampling vs random assignment {#sampling-conclusion-sampling-vs-assignment} - +We'll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section \@ref(sampling-framework): +* In Chapter \@ref(confidence-intervals), we'll cover examples of statistical inference for + + Scenario 2: The mean age $\mu$ of all pennies in circulation in the US. + + Scenario 3: The difference $p_1 - p_2$ in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of *two-sample* inference. +* In Chapter \@ref(hypothesis-testing), we'll cover an example of statistical inference for + + Scenario 4: The difference $\mu_1 - \mu_2$ in average IMDB ratings for action and romance movies. This is another example of *two-sample* inference. +* In Chapter \@ref(inference-for-regression), we'll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter \@ref(regression) on basic regression and Chapter \@ref(multiple-regression) on multiple regression. Specifically + + Scenario 5: The intercept $\beta_0$ of some population regression line. + + Scenario 6: The slope $\beta_1$ of some population regression line. -### Theory: Central Limit Theorem {#sampling-conclusion-central-limit-theorem} - -What you did in Section \@ref(tactile) and \@ref(virtual) was demonstrate a very famous theorem, or mathematically proven truth, called the *Central Limit Theorem*. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get - -1. More and more normal -1. More and more narrow - -Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else? - -1. The average weight of wild bunny rabbits! -1. The average wing span of dragons! - -
- -
- - -### Formula: Standard error {#sampling-conclusion-standard-error} -### Closing notes - -This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter \@ref(confidence-intervals) for confidence intervals and Chapter \@ref(hypothesis-testing) for hypothesis testing. +### Additional resources An R script file of all R code used in this chapter is available [here](scripts/08-sampling.R). +### What's to come? +Recall in our Obama poll case study in Section \@ref(sampling-case-study) that based on this particular sample, the Harvard University Institute of Politics' best guess of Obama's approval rating among all young Americans was 41%. However, this isn't the end of the story. If you read further in the article, it states: +> The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll's margin of error was plus or minus 2.1 percentage points. - - - +Note the term *margin of error*, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won't get it perfectly right; there will always be a certain amount of error caused by *sampling variation*. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about $\pm$ 2.1%, in words from about 2.1% too small to about 2.1% too big for an interval of [41% - 2.1%, 41% + 2.1%] = [37.9%, 43.1%]. Remember that this notation corresponds to 37.9% and 43.1% being included as well as all numbers between the two of them. We'll see in the next chapter that such intervals are known as *confidence intervals*. diff --git a/09-confidence-intervals.Rmd b/09-confidence-intervals.Rmd index 76ca4e0f7..d249b4b88 100755 --- a/09-confidence-intervals.Rmd +++ b/09-confidence-intervals.Rmd @@ -21,6 +21,22 @@ options(scipen = 99, digits = 3) set.seed(76) ``` + + +*** + + + +```{block, type='announcement', purl=FALSE} +**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!** +``` + + + +*** + + + In Chapter \@ref(sampling), we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter \@ref(sampling): Generally speaking, we learned that if the sampling of a sample of size $n$ is done at *random*, then the resulting sample is *unbiased* and *representative* of the *population*, thus any result based on the sample can *generalize* to the population, and hence the **point estimate/sample statistic** computed from this sample is a "good guess" of the unknown population parameter of interest @@ -89,7 +105,7 @@ library(infer) ---- +*** @@ -324,7 +340,7 @@ knitr::include_graphics("images/flowcharts/infer/ci_diagram.png") ---- +*** @@ -524,7 +540,7 @@ If we aren't able to use the sample mean as a good guess for the population mean ---- +*** @@ -661,7 +677,7 @@ After this elaboration on what the level corresponds to in a confidence interval ---- +*** @@ -948,7 +964,7 @@ Theoretical methods like this have largely been used in the past since we didn't ---- +*** @@ -1088,7 +1104,7 @@ Practice problems to come soon! ---- +*** diff --git a/10-hypothesis-testing.Rmd b/10-hypothesis-testing.Rmd index c58aa5083..b2696863d 100755 --- a/10-hypothesis-testing.Rmd +++ b/10-hypothesis-testing.Rmd @@ -21,6 +21,22 @@ options(scipen = 99, digits = 3) set.seed(76) ``` + + +*** + + + +```{block, type='announcement', purl=FALSE} +**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!** +``` + + + +*** + + + We saw some of the main concepts of hypothesis testing introduced in Chapters \@ref(sampling) and \@ref(confidence-intervals). We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations. The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the `infer` package pipeline in Chapter \@ref(confidence-intervals). The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix \@ref(appendixB). @@ -47,7 +63,7 @@ library(knitr) ---- +*** @@ -77,7 +93,7 @@ library(knitr) ---- +*** @@ -126,7 +142,7 @@ As you get more and more practice with hypothesis testing, you'll be better able ---- +*** @@ -150,7 +166,7 @@ Before we hop into this framework, we will provide another way to think about hy ---- +*** @@ -193,7 +209,7 @@ When you run a hypothesis test, you are the jury of the trial. You decide wheth ---- +*** @@ -252,7 +268,7 @@ So if we can set $\alpha$ to be whatever we want, why choose 0.05 instead of 0.0 ---- +*** @@ -275,7 +291,7 @@ The idea that sample results are more extreme than we would reasonably expect to ---- +*** @@ -295,7 +311,7 @@ We'll first explore the two variable case by comparing two means. Note the secti ---- +*** @@ -651,7 +667,7 @@ we fail to reject $H_0$. (If no significance level is given, one can assume $\a ---- +*** @@ -663,7 +679,7 @@ These traditional methods have been used for many decades back to the time when ### Example: $t$-test for two independent samples -What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the $z$-score. The formula for a $z$-score is $$Z = \frac{x - \mu}{\sigma},$$ where $x$ represent the value of a variable, $\mu$ represents the mean of the variable, and $\sigma$ represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding $z$-score that gives how many standard deviations away that value is from its mean. $z$-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. +What is commonly done in statistics is the process of standardization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common standardization is known as the $z$-score. The formula for a $z$-score is $$Z = \frac{x - \mu}{\sigma},$$ where $x$ represent the value of a variable, $\mu$ represents the mean of the variable, and $\sigma$ represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding $z$-score that gives how many standard deviations away that value is from its mean. $z$-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below. ```{r echo=FALSE} ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm) @@ -671,7 +687,7 @@ ggplot(data.frame(x = c(-4, 4)), aes(x)) + stat_function(fun = dnorm) Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity. -Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the $t$-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is $$T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}} }$$ +Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This standardization is often called the $t$-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is $$T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}} }$$ There is a lot to try to unpack here. @@ -758,7 +774,7 @@ Since all three conditions are met, we can be reasonably certain that the theory ---- +*** diff --git a/11-inference-for-regression.Rmd b/11-inference-for-regression.Rmd index 11a5cb3f2..e932aed02 100644 --- a/11-inference-for-regression.Rmd +++ b/11-inference-for-regression.Rmd @@ -23,20 +23,18 @@ set.seed(76) ---- +*** + -```{block, type='learncheck', purl=FALSE} -**Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at .** -
-/begin{center} -`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px", - latex_opts = "width=20%")` -/end{center} -
+```{block, type='announcement', purl=FALSE} +**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!** ``` ---- + + +*** + ### Needed packages {-} @@ -61,7 +59,7 @@ library(patchwork) ---- +*** @@ -162,7 +160,7 @@ Since `r pull(slope_obs)` falls far to the right of this plot beyond where any o ---- +*** @@ -182,7 +180,7 @@ To further reinforce the process being done in the pipeline, we've added the `ty If instead we'd like to get a range of plausible values for the true slope value, we can use the process of bootstrapping: -```{r echo=FALSE} +```{r} bootstrap_slope_distn <- evals %>% specify(score ~ bty_avg) %>% generate(reps = 10000, type = "bootstrap") %>% @@ -227,7 +225,7 @@ With the bootstrap distribution being close to symmetric, it makes sense that th ---- +*** @@ -333,7 +331,7 @@ An R script file of all R code used in this chapter is available [here](scripts/ ---- +*** diff --git a/12-thinking-with-data.Rmd b/12-thinking-with-data.Rmd index 6c189cfa9..bc3816226 100755 --- a/12-thinking-with-data.Rmd +++ b/12-thinking-with-data.Rmd @@ -11,29 +11,35 @@ rq <- 0 knitr::opts_chunk$set( tidy = FALSE, - out.width = '\\textwidth' + out.width = '\\textwidth', + fig.height = 4, + warning = FALSE ) + options(scipen = 99, digits = 3) -# This bit of code is a bug fix on asis blocks, which we use to show/not show LC -# solutions, which are written like markdown text. In theory, it shouldn't be -# necessary for knitr versions <=1.11.6, but I've found I still need to for -# everything to knit properly in asis blocks. More info here: -# https://stackoverflow.com/questions/32944715/conditionally-display-block-of-markdown-text-using-knitr -library(knitr) -knit_engines$set(asis = function(options) { - if (options$echo && options$eval) knit_child(text = options$code) -}) +# Set random number generator see value for replicable pseudorandomness. Why 76? +# https://www.youtube.com/watch?v=xjJ7FheCkCU +set.seed(76) +``` + + + +*** + -# This controls which LC solutions to show. Options for solutions_shown: "ALL" -# (to show all solutions), or subsets of c('4-4', '4-5'), including the -# null vector c('') to show no solutions. -solutions_shown <- c('') -show_solutions <- function(section){ - return(solutions_shown == "ALL" | section %in% solutions_shown) - } + +```{block, type='announcement', purl=FALSE} +**In preparation for our first print edition to be published by CRC Press in Fall 2019, we're remodeling this chapter a bit. Don't expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at [ModernDive.com](https://moderndive.com/) by early Summer 2019!** ``` + + +*** + + + + Recall in Section \@ref(sec:intro-for-students) "Introduction for students" and at the end of chapters throughout this book, we displayed the "ModernDive flowchart" mapping your journey through this book. ```{r moderndive-figure-conclusion, echo=FALSE, fig.align='center', fig.cap="ModernDive Flowchart"} @@ -88,19 +94,11 @@ library(scales) ``` -### DataCamp {-} - -The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author [Albert Y. Kim's](https://twitter.com/rudeboybert) DataCamp course "Modeling with Data in the Tidyverse." If you're interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 "Introduction to Modeling" and Chapter 3 "Modeling with Multiple Regression." - -```{r, echo=FALSE, results='asis'} -image_link(path = "images/datacamp_intro_to_modeling.png", link = "https://www.datacamp.com/courses/modeling-with-data-in-the-tidyverse") -``` - -Case studies involving data in the `fivethirtyeight` R package form the basis of ModernDive co-author [Chester Ismay's](https://twitter.com/old_man_chester?lang=en) DataCamp course "Effective Data Storytelling in the Tidyverse." This free course can be accessed [here](https://www.datacamp.com/courses/effective-data-storytelling-using-the-tidyverse-free). *** + ## Case study: Seattle house prices {#seattle-house-prices} [Kaggle.com](https://www.kaggle.com/) is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the [House Sales in King County, USA](https://www.kaggle.com/harlfoxem/housesalesprediction) consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This [CC0: Public Domain](https://creativecommons.org/publicdomain/zero/1.0/) licensed dataset is included in the `moderndive` package in the `house_prices` data frame, which we'll refer to as the "Seattle house prices" dataset. @@ -229,7 +227,7 @@ data_frame(Price = c(1,10,100,1000,10000,100000,1000000)) %>% Let's break this down: 1. When purchasing a cup of coffee, we tend to think of prices ranging in single dollars e.g. \$2 or \$3. However when purchasing say mobile phones, we don't tend to think in prices in single dollars e.g. \$676 or \$757, but tend to round to the nearest unit of hundreds of dollars e.g. \$200 or \$500. -1. Let's say want to know the log10-transformed value of \$76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since \$76 is between \$10 and \$100. In fact, `log10(76)` is 1.880814. +1. Let's say we want to know the log10-transformed value of \$76. Even if this would be hard to compute without a calculator, we know that its log10 value is between 1 and 2, since \$76 is between \$10 and \$100. In fact, `log10(76)` is 1.880814. 1. log10-transformations are *monotonic*, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B). 1. Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: \$100 to \$1000. @@ -441,22 +439,26 @@ intepreting the inference for regression in Subsection \@ref(house-prices-infere +*** + + + ## Case study: Effective data storytelling {#data-journalism} ---- ```{block, type='learncheck', purl=FALSE} **Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at .** +```
-/begin{center} -`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px", - latex_opts = "width=20%")` -/end{center} +`r include_image(path = "images/sign-2408065_1920.png", html_opts="height=100px", latex_opts = "width=20%")`
-``` ---- + + +*** + + As we've progressed throughout this book, you've seen how to work with data in a variety of ways. You've learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You've summarized data in table form and calculated summary statistics for a variety of different variables. Further, you've seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you've explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you've learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the "effective data storytelling" done by data journalists around the world. Great data stories don't mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling. diff --git a/91-appendixA.Rmd b/91-appendixA.Rmd index f3d2a1b79..b1957a517 100755 --- a/91-appendixA.Rmd +++ b/91-appendixA.Rmd @@ -38,3 +38,7 @@ The **distribution** of a variable/dataset corresponds to generalizing patterns **Outliers** correspond to values in the dataset that fall far outside the range of "ordinary" values. In regards to a boxplot (by default), they correspond to values below $Q_1 - (1.5 * IQR)$ or above $Q_3 + (1.5 * IQR)$. Note that these terms (aside from **Distribution**) only apply to quantitative variables. + + + +## Normal distribution discussion diff --git a/92-appendixB.Rmd b/92-appendixB.Rmd index 16c2d37e0..4cf0e3c61 100755 --- a/92-appendixB.Rmd +++ b/92-appendixB.Rmd @@ -233,7 +233,11 @@ We see that `r mu0` is not contained in this confidence interval as a plausible **Interpretation**: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between `r ci[["2.5%"]]` and `r ci[["97.5%"]]`. ---- + + +*** + + ### Traditional methods @@ -469,7 +473,11 @@ We see that 0.80 is contained in this confidence interval as a plausible value o **Interpretation**: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between `r ci[["2.5%"]]` and `r ci[["97.5%"]]`. ---- + + +*** + + ### Traditional methods @@ -722,7 +730,11 @@ We see that 0 is not contained in this confidence interval as a plausible value **Interpretation**: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between `r round(-ci[["2.5%"]], 2)` dollars smaller to `r round(-ci[["97.5%"]], 2)` dollars smaller than for college graduates. ---- + + +*** + + ### Traditional methods @@ -784,7 +796,11 @@ The $p$-value---the probability of observing a $Z$ value of -3.16 or more extrem We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians. ---- + + +*** + + ### Comparing results @@ -997,7 +1013,11 @@ We see that 0 is contained in this confidence interval as a plausible value of $ **Note**: You could also use the null distribution based on randomization with a shift to have its center at $\bar{x}_{sac} - \bar{x}_{cle} = \$`r round(d_hat, 2)`$ instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above. ---- + + +*** + + ### Traditional methods @@ -1258,7 +1278,11 @@ We see that 0 is not contained in this confidence interval as a plausible value **Interpretation**: We are 95% confident the true mean zinc concentration on the surface is between `r round(-ci[["2.5%"]], 2)` units smaller to `r round(-ci[["97.5%"]], 2)` units smaller than on the bottom. ---- + + +*** + + ### Traditional methods @@ -1316,7 +1340,11 @@ pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE) We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations. ---- + + +*** + + ### Comparing results diff --git a/94-appendixD.Rmd b/94-appendixD.Rmd index b4b5c8506..eab5b6d98 100644 --- a/94-appendixD.Rmd +++ b/94-appendixD.Rmd @@ -1,5 +1,8 @@ # Learning Check Solutions {#appendixD} + + ```{r setup_lc_solutions, include=FALSE, purl=FALSE} knitr::opts_chunk$set(tidy = FALSE, out.width = '\\textwidth') # This bit of code is a bug fix on asis blocks, which we use to show/not show LC solutions, which are written like markdown text. In theory, it shouldn't be necessary for knitr versions <=1.11.6, but I've found I still need to for everything to knit properly in asis blocks. More info here: @@ -27,6 +30,19 @@ library(ggplot2) library(nycflights13) ``` +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Repeat the above installing steps, but for the `dplyr`, `nycflights13`, and `knitr` packages. This will install the earlier mentioned `dplyr` package, the `nycflights13` package containing data on all domestic flights leaving a NYC airport in 2013, and the `knitr` package for writing reports in R. + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** "Load" the `dplyr`, `nycflights13`, and `knitr` packages as well by repeating the above steps. + +**Solution**: If the following code runs with no errors, you've succeeded! + +```{r, eval=FALSE} +library(dplyr) +library(nycflights13) +library(knitr) +``` + + **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does any *ONE* row in this `flights` dataset refer to? - A. Data on an airline @@ -62,6 +78,18 @@ library(nycflights13) * `chr`: character. i.e. text +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame? Note that you may want to use `?airports` to get more information. + +**Solution**: `lat` `long` represent the airport geographic coordinates, `alt` is the altitude above sea level of the airport (Run `airports %>% filter(faa == "DEN")` to see the altitude of Denver International Airport), `tz` is the time zone difference with respect to GMT in London UK, `dst` is the daylight savings time zone, and `tzone` is the time zone label. + + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. + +**Solution**: + +* In the `weather` example in LC3.8, the combination of `origin`, `year`, `month`, `day`, `hour` are identification variables as they identify the observation in question. +* Anything else pertains to observations: `temp`, `humid`, `wind_speed`, etc. + *** @@ -147,7 +175,7 @@ ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) + geom_line() ``` -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures? +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures? **Solution**: The distribution doesn't change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the `temp` variabile by `View(weather)`, we see that the precision of each temperature recording is 2 decimal places. @@ -190,7 +218,7 @@ the middle 50% of values, as delineated by the interquartile range is 30°F: **Solution**: -* We'd have 365 facets to look at. Way to many. +* We'd have 365 facets to look at. Way too many. * We don't really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We'd like to focus on seasonal trends. **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Does the `temp` variable in the `weather` data-set have a lot of variability? Why do you say that? @@ -241,9 +269,9 @@ weather %>% kable() ``` -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can't we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example? +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** We looked at the distribution of the numerical variable `temp` split by the numerical variable `month` that we converted to a categorical variable using the `factor()` function. Why would a boxplot of `temp` split by the numerical variable `pressure` similarly converted to a categorical variable using the `factor()` not be informative? -**Solution**: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can't make boxplots. +**Solution**: Because there are 12 unique values of `month` yielding only 12 boxes in our boxplot. There are many more unique values of `pressure` (`r weather$pressure %>% unique() %>% length()` unique values in fact), because values are to the first decimal place. This would lead to `r weather$pressure %>% unique() %>% length()` boxes, which is too many for people to digest. **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram? @@ -311,127 +339,7 @@ weather %>% chap <- 4 lc <- 0 # This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions. -solutions_shown <- c('4-1', '4-2', '4-3', '4-4') -# solutions_shown <- c('') -show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)} -``` - -```{r message=FALSE} -library(dplyr) -library(ggplot2) -library(nycflights13) -library(tidyr) -library(readr) -``` - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article [Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?](https://fivethirtyeight.com/features/dear-mona-followup-where-do-people-drink-the-most-beer-wine-and-spirits/) - -```{r echo=FALSE} -drinks_sub <- drinks %>% - select(-total_litres_of_pure_alcohol) %>% - filter(country %in% c("USA", "Canada", "South Korea")) -drinks_sub_tidy <- drinks_sub %>% - gather(type, servings, -c(country)) %>% - mutate( - type = str_sub(type, start=1, end=-10) - ) %>% - arrange(country, type) %>% - rename(`alcohol type` = type) -drinks_sub -``` - -This data frame is not in tidy format. What would it look like if it were? - -**Solution**: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column. - -```{r, include=show_solutions('4-1'), echo=FALSE} -drinks_sub_tidy -``` - -Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format. - -```{r, include=show_solutions('4-1'), echo=FALSE} -drinks_sub_tidy %>% - arrange(`alcohol type`) -``` - - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What properties of the observational unit do each of `lat`, `lon`, `alt`, `tz`, `dst`, and `tzone` describe for the `airports` data frame? Note that you may want to use `?airports` to get more information. - -**Solution**: `lat` `long` represent the airport geographic coordinates, `alt` is the altitude above sea level of the airport (Run `airports %>% filter(faa == "DEN")` to see the altitude of Denver International Airport), `tz` is the time zone difference with respect to GMT in London UK, `dst` is the daylight savings time zone, and `tzone` is the time zone label. - - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions. - -**Solution**: - -* In the `weather` example in LC3.8, the combination of `origin`, `year`, `month`, `day`, `hour` are identification variables as they identify the observation in question. -* Anything else pertains to observations: `temp`, `humid`, `wind_speed`, etc. - - - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Convert the `dem_score` data frame into -a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-formatted data frame. - -**Solution**: Running the following in the console: - -```{r, include=show_solutions('4-3')} -dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country) -``` - -Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame. - -```{r, include=show_solutions('4-3')} -dem_score -dem_score_tidy -``` - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Read in the life expectancy data stored at and convert it to a tidy data frame. - -**Solution**: The code is similar - -```{r, eval=FALSE,include=show_solutions('4-3'), echo=show_solutions('4-3')} -life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv') -life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country) -``` -```{r, echo=FALSE, purl=FALSE, message=FALSE, warning=FALSE} -life_expectancy <- read_csv('data/le_mess.csv') -life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country) -``` - -We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`: - -```{r, lc4-2solutions-4, include=show_solutions('4-3')} -life_expectancy -life_expectancy_tidy -``` - - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets? - -**Solution**: Rows correspond to observations, while columns correspond to variables. - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data? - -**Solution**: Tidy datasets are an organized way of viewing data. We'll see later that this format is required for the `ggplot2` and `dplyr` packages for data visualization and wrangling. - -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms? What are some disadvantages? - -**Solution**: When datasets are in normal form, we can easily `_join` them with other datasets! For example, can we join the `flights` data with the `planes` data? We'll see this more in Chapter 5! - - - -*** - - - -## Chapter 5 Solutions - -```{r, include=FALSE, purl=FALSE} -chap <- 5 -lc <- 0 -# This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions. -solutions_shown <- c('5-1', '5-2', '5-3', '5-4', '5-5', '5-6', '5-7') +solutions_shown <- c('4-1', '4-2', '4-3', '4-4', '4-5', '4-6', '4-7') # solutions_shown <- c('') show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)} ``` @@ -443,7 +351,7 @@ library(nycflights13) ``` -**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above. +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What's another way using the "not" operator `!` to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the `flights` data frame? Test this out using the code above. **Solution**: @@ -636,6 +544,10 @@ with? **Solution**: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn't it be easier and quicker to take the train? +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some advantages of data in normal forms? What are some disadvantages? + +**Solution**: When datasets are in normal form, we can easily `_join` them with other datasets! For example, we can join the `flights` data with the `planes` data. + **`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are some ways to select all three of the `dest`, `air_time`, and `distance` variables from `flights`? Give the code showing how to do this in at least three different ways. **Solution**: @@ -743,7 +655,7 @@ flights %>% summarize(ASM = sum(ASM)) ``` -However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \ref(summarize): +However, because for certain carriers certain flights have missing `NA` values, the resulting table also returns `NA`'s. We can eliminate these by adding a `na.rm = TRUE` argument to `sum()`, telling R that we want to remove the `NA`'s in the sum. We saw this in Section \@ref(summarize): ```{r, include=show_solutions('5-7')} flights %>% @@ -787,8 +699,114 @@ flights %>% +## Chapter 5 Solutions + +```{r, include=FALSE, purl=FALSE} +chap <- 5 +lc <- 0 +# This controls which LC solutions to show. Options for solutions_shown: "ALL" (to show all solutions), or subsets of c('2-1', '2-2'), including the null vector c('') to show no solutions. +solutions_shown <- c('5-1', '5-2', '5-3', '5-4') +# solutions_shown <- c('') +show_solutions <- function(section){return(solutions_shown == "ALL" | section %in% solutions_shown)} +``` + +```{r message=FALSE} +library(dplyr) +library(ggplot2) +library(nycflights13) +library(tidyr) +library(readr) +``` + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What are common characteristics of "tidy" datasets? + +**Solution**: Rows correspond to observations, while columns correspond to variables. + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** What makes "tidy" datasets useful for organizing data? + +**Solution**: Tidy datasets are an organized way of viewing data. This format is required for the `ggplot2` and `dplyr` packages for data visualization and wrangling. + + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Take a look the `airline_safety` data frame included in the `fivethirtyeight` data. Run the following: + +```{r, eval=FALSE} +airline_safety +``` + +After reading the help file by running `?airline_safety`, we see that `airline_safety` is a data frame containing information on different airlines companies' safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver's article ["Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?"](https://fivethirtyeight.com/features/should-travelers-avoid-flying-airlines-that-have-had-crashes-in-the-past/). Let's ignore the `incl_reg_subsidiaries` and `avail_seat_km_per_week` variables for simplicity: + +```{r} +airline_safety_smaller <- airline_safety %>% + select(-c(incl_reg_subsidiaries, avail_seat_km_per_week)) +airline_safety_smaller +``` + +This data frame is not in "tidy" format. How would you convert this data frame to be in "tidy" format, in particular so that it has a variable `incident_type_years` indicating the indicent type/year and a variable `count` of the counts? + +**Solution**: Using the `gather()` function from the `tidyr` package: + +```{r} +airline_safety_smaller_tidy <- airline_safety_smaller %>% + gather(key = incident_type_years, value = count, -airline) +airline_safety_smaller_tidy +``` + +If you look at the resulting `airline_safety_smaller_tidy` data frame in the spreadsheet viewer, you'll see that the variable `incident_type_years` has 6 possible values: `"incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99", +"incidents_00_14", "fatal_accidents_00_14", "fatalities_00_14"` corresponding to the 6 columns of `airline_safety_smaller` we tidied. + + + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Convert the `dem_score` data frame into +a tidy data frame and assign the name of `dem_score_tidy` to the resulting long-formatted data frame. + +**Solution**: Running the following in the console: + +```{r, include=show_solutions('4-3')} +dem_score_tidy <- dem_score %>% + gather(key = year, value = democracy_score, - country) +``` + +Let's now compare the `dem_score` and `dem_score_tidy`. `dem_score` has democracy score information for each year in columns, whereas in `dem_score_tidy` there are explicit variables `year` and `democracy_score`. While both representations of the data contain the same information, we can only use `ggplot()` to create plots using the `dem_score_tidy` data frame. + +```{r, include=show_solutions('4-3')} +dem_score +dem_score_tidy +``` + +**`r paste0("(LC", chap, ".", (lc <- lc + 1), ")")`** Read in the life expectancy data stored at and convert it to a tidy data frame. + +**Solution**: The code is similar + +```{r, eval=FALSE,include=show_solutions('4-3'), echo=show_solutions('4-3')} +life_expectancy <- read_csv("https://moderndive.com/data/le_mess.csv") +life_expectancy_tidy <- life_expectancy %>% + gather(key = year, value = life_expectancy, -country) +``` +```{r, echo=FALSE, purl=FALSE, message=FALSE, warning=FALSE} +life_expectancy <- read_csv('data/le_mess.csv') +life_expectancy_tidy <- life_expectancy %>% + gather(key = year, value = life_expectancy, -country) +``` + +We observe the same construct structure with respect to `year` in `life_expectancy` vs `life_expectancy_tidy` as we did in `dem_score` vs `dem_score_tidy`: + +```{r, lc4-2solutions-4, include=show_solutions('4-3')} +life_expectancy +life_expectancy_tidy +``` + + + + +*** + + + + ## Chapter 6 Solutions +To come! + ```{r, include=FALSE, purl=FALSE} chap <- 6 lc <- 0 diff --git a/NEWS.md b/NEWS.md index 3c2af5a3b..2d66c5439 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,26 +1,9 @@ -# ModernDive 0.4.0.9000 +# ModernDive 0.5.0.9000 ## Major refactoring of inference chapters of book -### Old chapter structure - -* Chapter 8 - Sampling - 1. Introduction to sampling - a) Concepts related to sampling - b) Inference via sampling - 2. Tactile sampling simulation - a) Using the shovel once - b) Using the shovel 33 times - 3. Virtual sampling simulation - a) Using the shovel once - b) Using shovel 33 times - c) Using shovel 1000 times - d) Using different shovels - 4. In real-life sampling: Polls - 5. Conclusion - a) Central Limit Theorem - b) What’s to come? - c) Script of R code +**Old Chapter Structure**: + * Chapter 9 - Confidence Intervals 1. Bootstrapping a) Data explanation @@ -83,61 +66,36 @@ d) Script of R code -### New chapter structure - -* Chapter 8 - Sampling - 1. Activity: Sampling from a bowl - a) Question: What proportion of this bowl is red? - b) Using shovel once - c) Using shovel 33 times - 1. Computer simulation: - a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer - b) Using shovel once - c) Using shovel 33 times - d) Using shovel 1000 times - e) Using different shovels - 1. Goal: Study fluctuations due to sampling variation - a) You probably already knew: Bigger sample size means "better" guess. - b) Comparing shovels: Role of sample size - 1. Framework: Sampling - a) Terminology for sampling (population, sample, point estimate, etc) - b) Statistical concepts: sampling distribution and standard error - c) Computer's random number generator - 1. Interpretation: - a) Visual display of differences - 1. Case study: Obama poll - 1. Big picture: - a) Table of inferential scenarios: Add bowl and obama poll (both p) - b) Why does this work? Theoretial result: CLT - c) There's a formula for that: SE formula that has sqrt(n) at the bottom - d) Appendix: Normal distribution discussion +**New Chapter Structure**: + * Chapter 9 - Confidence Intervals 1. Activity: Working with a sample of pennies from the bank. Are they representative of all pennies in the US. a) Question: What do I do when I only have one sample? b) Resampling once (paper slips) c) Resampling 33 times + d) Diagrams in Keynote 1. Computer simulation: a) What is resampling? b) Resampling once c) Resampling 33 times d) Resampling 1000 times 1. Goal: Generate an estimate that accounts for sampling variation - a) Constructing a confidence interval + a) Constructing a confidence interval: hide code to shade ci region and to get the actual values. b) Constructing a CI using percentile method c) Constructing a CI using SE method 1. Framework: Boostrap resampling with replacement a) What dplyr verbs did we use? b) There is only one test framework - c) the infer package + c) the infer package: make sure to draw parallels between dplyr code and infer verbs 1. Interpretation: - a) 95% speaks to reliability of the process, not about an particular interval + a) 95% speaks to reliability of the process, not about an particular interval. "We are 95% confident" b) What determines the width? Sample size, confidence levels (only int at population variance) 1. Case study: Comparing two proportions with Mythbusters data 1. Big picture: - a) Does this even work? Comparing sampling and bootstrap distribution. + a) Does this even work? Comparing sampling and bootstrap distribution. Do this using balls. b) Table of inferential scenarios: Add pennies (mu) and Mythbusters (p1 - p2) - c) Why does this work? Theoretical result: Donsker's theorem. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate - d) There's a formula for that! Margin of error using critical values z* + c) Why does this work? Theoretical result: Efron. The empirical CDF converges to the population CDF. Bootstrap works for any point estimate + d) There's a formula for that! Margin of error using critical values z. Talk about normal distributions. * Chapter 10 - Hypothesis Testing 1. Activity: Shuffling resumes between male and female job applicants a) Question: Are men and women rated for jobs differently? @@ -145,79 +103,147 @@ c) What about sampling variation? d) What did we actually observe? e) How likely is this result? - 1. Computer simulation: + f) Diagrams in Keynote + 1. Extension of previous framework/infer + a) Revisit verb framework + a) Permutation test resampling w/o replacement + b) There is only one test framework + a) Do activity via infer package 1. Goal: Choose between two possible truths while accounting for sampling variation a) Conducting a hypothesis test b) Null hypothesis that's assumed c) Null distribution of test statistics: A "alternate universe" distribution d) Observed test statistics e) Definition of p-value - 1. Framework: Permutation test resampling w/o replacement - a) Revisit verb framework - b) There is only one test framework - c) the infer package 1. Interpretation: a) A yes/no-type decision: statistical significance via alpha b) Types of errors: 2x2 table c) Analogy of criminal justice system 1. Case study: Comparing two means with action vs romance movie data 1. Big picture: + a) When is inference not needed: EDA can solve the problem. a) Problems with p-values: p-hacking, hard to understand, ASA statement b) Comparison with confidence intervals. HT yields binary decision, but CI's yield plausible range of estimates. This is statistical vs practical significance c) Table of inferential scenarios: Add action vs romance (mu1 - mu2) - d) Why does this work? Theoretical result: Neyman-Pearson lemma - e) There's a formula for that! t-test + d) Why does this work? Theoretical result: Neyman-Pearson lemma (maybe) + e) There's a formula for that! t-test. Draw a null distribution with t-distribution superimposed. * Chapter 11 - Inference for Regression 1. Activity: Revisit simple linear regression a) Question: Is there a significant relationship between teaching score and bty score above and beyond any evidence due to sampling variation. b) Review exercise/re-run all code c) Regression table 1. Computer simulation: - a) Bootstraping the relationship - b) Permuting the relationship + a) Permuting the relationship: to do a hypothesis test assuming independence of y & x. + a) Bootstraping the rows: Having done HT, generate confidence interval. 1. Goal: Inferring about the population regression slope - a) 1. Framework: 1. Interpretation: - a) Values in table are given! No simulations necessary! - b) Conditions for inference: residual and partial residual plots - 1. Case study: Mmultiple regression example from Ch 7. - a) + a) "You don't have to do any of this! Values in table are given!" No simulations necessary! + b) Conditions for inference: residual and partial residual plots, assumption of indepdence. + 1. Case study: Multiple regression example from Ch 7. 1. Big picture: a) ANOVA = Regression with categorical variables - b) Table of inferential scenarios: Add TBD (beta1) - c) Why does this work? Theoretical result: Gauss-Markov Theorem - d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope. Note there is a sqrt(n) in denominator. + b) Table of inferential scenarios: Add (beta1) + c) Why does this work? + d) There's a formula for that! Fitted intercept and slope. SE of fitted intercept and slope: observe there is a sqrt(n) in denominator. + +*** + + + +# ModernDive 0.5.0 + +## Highlights + +* "Data wrangling" chapter now comes after "Tidy data" chapter. +* Improved explanations and examples of `geom_histogram()`, `geom_boxplot()`, and "tidy" data +* Moving residual analysis from regression Chapters 6 & 7 to Chap 11: Inference for regression +* Reorganized Chap 8 on Sampling +* All learning check solutions now in Appendix D +* PDF build re-added (still a work-in-progress) + ## All content changes -* Changed title from "Statistical Inference via Data Science in R" to "Statistical Inference via Data Science: A moderndive into R and the tidyverse" +* Changed title + + From: "Statistical Inference via Data Science in R" + + To: "Statistical Inference via Data Science: A moderndive into R and the tidyverse" * Chapter 2 - Getting Started + Added subsection 2.2.3 "Errors, warnings, and messages" by @andrewheiss * Chapter 3 - Data visualization: - + Added simpler introductory `geom_histogram()` example - + Added simpler introductory `geom_boxplot()` example - + Started downweighting the amount of data wrangling previews included in this chapter, in particular `join` + + Added simpler introductory `geom_histogram()` and `geom_boxplot()` examples + + Started downweighting the amount of data wrangling previews included in this chapter, in particular `join`. + Cleaned up conclusion section + + Added cheatsheet * Switched order of "Chap 4 Tidy Data" and "Chap 5 Data Wrangling": Data Wrangling now comes first * Chapter 4 - Data wrangling: + + Added cheatsheet * Chapter 5 - Renamed to "Importing and tidy data" + Reordered sections: importing then tidying - + Added `fivethirtyeight::drinks` example of hitting the non-tidy wall, then using `tidyr::gather()` + + Added `fivethirtyeight::drinks` example of "hitting the non-tidy wall", then using `tidyr::gather()` + Made Guatemala democracy score a case study. + Added discussion on what `tidyverse` package is. + + Moved discussion on normal forms to Ch4: Data Wrangling - joins. + + Moved discussion on identification vs measurement variables to Ch2: Getting started with data. * Chapter 6 - Basic regression: + Moved residual analysis to Chapter 11 * Chapter 7 - Multiple regression: + Moved residual analysis to Chapter 11 +* Chapter 8 - Sampling: Major refactoring of presentation/exposition; see below * Chapter 11 - Inference for regression: + Moved residual analysis from Chapter 6 & 7 here * Moved all Learning Check solutions to Appendix D - -## Other changes -* Added PDF build + +### Chapter 8 Sampling Refactoring + +**Old chapter structure**: + +1. Introduction to sampling + a) Concepts related to sampling + b) Inference via sampling +2. Tactile sampling simulation + a) Using the shovel once + b) Using the shovel 33 times +3. Virtual sampling simulation + a) Using the shovel once + b) Using shovel 33 times + c) Using shovel 1000 times + d) Using different shovels +4. In real-life sampling: Polls +5. Conclusion + a) Central Limit Theorem + b) What’s to come? + c) Script of R code + +**New chapter structure**: + +1. Activity: Sampling from a bowl + a) Question: What proportion of this bowl is red? + b) Using shovel once + c) Using shovel 33 times +1. Computer simulation: + a) What is a simulation? We just did a "tactile" one by hand, now let's do one using the the computer + b) Using shovel once + c) Using shovel 33 times + d) Using shovel 1000 times + e) Using different shovels +1. Goal: Study fluctuations due to sampling variation + a) You probably already knew: Bigger sample size means "better" guess. + b) Comparing shovels: Role of sample size +1. Framework: Sampling + a) Terminology for sampling (population, sample, point estimate, etc) + b) Statistical concepts: sampling distribution and standard error + c) Computer's random number generator +1. Interpretation: + a) Visual display of differences +1. Case study: Obama poll +1. Big picture: + a) Table of inferential scenarios: Add bowl and obama poll (both p) + b) Why does this work? Theoretial result: CLT + c) There's a formula for that: SE formula that has sqrt(n) at the bottom + d) Appendix: Normal distribution discuss diff --git a/_output.yml b/_output.yml index acfa7773e..7946f5888 100755 --- a/_output.yml +++ b/_output.yml @@ -1,4 +1,20 @@ # Modified from https://github.com/rstudio/bookdown/blob/master/inst/examples/_output.yml +#bookdown::pdf_book: +# includes: +# in_header: latex/preamble.tex +# before_body: latex/before_body.tex +# after_body: latex/after_body.tex +# keep_tex: true +# dev: "cairo_pdf" +# latex_engine: xelatex +# citation_package: natbib +# template: null +# pandoc_args: --top-level-division=chapter +# toc_depth: 3 +# toc_unnumbered: false +# toc_appendix: true +# quote_footer: ["\\VA{", "}{}"] +# highlight_bw: true bookdown::gitbook: df_print: default css: style.css @@ -18,19 +34,4 @@ bookdown::gitbook: before_body: _includes/logo.html #bookdown::epub_book: default #bookdown::word_document2: default -#bookdown::pdf_book: -# includes: -# in_header: latex/preamble.tex -# before_body: latex/before_body.tex -# after_body: latex/after_body.tex -# keep_tex: true -# dev: "cairo_pdf" -# latex_engine: xelatex -# citation_package: natbib -# template: null -# pandoc_args: --top-level-division=chapter -# toc_depth: 3 -# toc_unnumbered: false -# toc_appendix: true -# quote_footer: ["\\VA{", "}{}"] -# highlight_bw: true + diff --git a/docs/10-example-comparing-two-proportions.html b/docs/10-example-comparing-two-proportions.html deleted file mode 100644 index e88725482..000000000 --- a/docs/10-example-comparing-two-proportions.html +++ /dev/null @@ -1,784 +0,0 @@ - - - - - - - - Chapter 10 Example: Comparing two proportions | Statistical Inference via Data Science - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
- -
- -
-
- - -
-
- -
- -ModernDive - -
-

Chapter 10 Example: Comparing two proportions

-

If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here.

-

Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out.

-
mythbusters_yawn
-
# A tibble: 50 x 3
-    subj group   yawn 
-   <int> <chr>   <chr>
- 1     1 seed    yes  
- 2     2 control yes  
- 3     3 seed    no   
- 4     4 seed    yes  
- 5     5 seed    no   
- 6     6 control no   
- 7     7 seed    yes  
- 8     8 control no   
- 9     9 control no   
-10    10 seed    no   
-# … with 40 more rows
-
    -
  • The participant ID is stored in the subj variable with values of 1 to 50.
  • -
  • The group variable is either "seed" for when a confederate was trying to influence the participant or "control" if a confederate did not interact with the participant.
  • -
  • The yawn variable is either "yes" if the participant yawned or "no" if the participant did not yawn.
  • -
-

We can use the janitor package to get a glimpse into this data in a table format:

-
mythbusters_yawn %>% 
-  tabyl(group, yawn) %>% 
-  adorn_percentages() %>% 
-  adorn_pct_formatting() %>% 
-  # To show original counts
-  adorn_ns()
-
   group         no        yes
- control 75.0% (12) 25.0%  (4)
-    seed 70.6% (24) 29.4% (10)
-

We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.

-

In looking over this problem, we can make note of some important details to include in our infer pipeline:

-
    -
  • We are calling a success having a yawn value of "yes".
  • -
  • Our response variable will always correspond to the variable used in the success so the response variable is yawn.
  • -
  • The explanatory variable is the other variable of interest here: group.
  • -
-

To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.

-
-

10.0.1 Compute the point estimate

-
mythbusters_yawn %>% 
-  specify(formula = yawn ~ group)
-
Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.
-

Note that the success argument must be specified in situations such as this where the response variable has only two levels.

-
mythbusters_yawn %>% 
-  specify(formula = yawn ~ group, success = "yes")
-
Response: yawn (factor)
-Explanatory: group (factor)
-# A tibble: 50 x 2
-   yawn  group  
-   <fct> <fct>  
- 1 yes   seed   
- 2 yes   control
- 3 no    seed   
- 4 yes   seed   
- 5 no    seed   
- 6 no    control
- 7 yes   seed   
- 8 no    control
- 9 no    control
-10 no    seed   
-# … with 40 more rows
-

We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.

-
mythbusters_yawn %>% 
-  specify(formula = yawn ~ group, success = "yes") %>% 
-  calculate(stat = "diff in props")
-
Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c("first", "second")` means `("first" - "second")`. Check `?calculate` for details.
-

We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put "seed" first after c() and then "control": order = c("seed", "control"). Our point estimate is thus calculated:

-
obs_diff <- mythbusters_yawn %>% 
-  specify(formula = yawn ~ group, success = "yes") %>% 
-  calculate(stat = "diff in props", order = c("seed", "control"))
-obs_diff
-
# A tibble: 1 x 1
-    stat
-   <dbl>
-1 0.0441
-

This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).

-
-
-

10.0.2 Bootstrap distribution

-

Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though.

-

In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn:

-
head(mythbusters_yawn)
-
# A tibble: 6 x 3
-   subj group   yawn 
-  <int> <chr>   <chr>
-1     1 seed    yes  
-2     2 control yes  
-3     3 seed    no   
-4     4 seed    yes  
-5     5 seed    no   
-6     6 control no   
-

When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of "seed" for group and "no" for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement.

-
set.seed(2019)
-
head(mythbusters_yawn) %>% 
-  sample_n(size = 6, replace = TRUE)
-
# A tibble: 6 x 3
-   subj group   yawn 
-  <int> <chr>   <chr>
-1     5 seed    no   
-2     5 seed    no   
-3     2 control yes  
-4     4 seed    yes  
-5     1 seed    yes  
-6     1 seed    yes  
-

We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below.

-
bootstrap_distribution <- mythbusters_yawn %>% 
-  specify(formula = yawn ~ group, success = "yes") %>% 
-  generate(reps = 1000) %>% 
-  calculate(stat = "diff in props", order = c("seed", "control"))
-
bootstrap_distribution %>% 
-  visualize(bins = 20)
-

-

This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used.

-
bootstrap_distribution %>% 
-  get_ci(type = "percentile", level = 0.95)
-
# A tibble: 1 x 2
-  `2.5%` `97.5%`
-   <dbl>   <dbl>
-1 -0.219   0.293
-

The confidence interval shown here includes the value of 0. We’ll see in Chapter 11 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.

-

Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.

-

Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified "seed" and then "control" positive values for the statistic correspond to the "seed" proportion being higher, whereas negative values correspond to the "control" group being higher.

-

We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.

-
-

-Learning check -

-
-

Practice problems to come soon!

-
- -
-
-
-
-

10.1 Conclusion

-
-

10.1.1 What’s to come?

-

This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 11 up next!

-
-
-

10.1.2 Script of R code

-

An R script file of all R code used in this chapter is available here.

- -
-
-
-
- -
-
-
- - -
-
- - - - - - - - - - - - - - diff --git a/docs/10-hypothesis-testing.html b/docs/10-hypothesis-testing.html index 071dddac2..bb0ab84b0 100644 --- a/docs/10-hypothesis-testing.html +++ b/docs/10-hypothesis-testing.html @@ -6,20 +6,20 @@ Chapter 10 Hypothesis Testing | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
  • 4.5 mutate existing variables
  • 4.6 arrange and sort rows
  • 4.7 join data frames
  • 4.8 Other verbs
    • 4.8.1 select variables
    • @@ -232,26 +233,24 @@
    • 5 Data Importing & “Tidy” Data
    • B Inference Examples
      • Needed packages
      • @@ -538,6 +531,13 @@

        Chapter 10 Hypothesis Testing

        +
        +
        +

        +In preparation for our first print edition to be published by CRC Press in Fall 2019, we’re remodeling this chapter a bit. Don’t expect major changes in content, but rather only minor changes in presentation. Our remodeling will be complete and available online at ModernDive.com by early Summer 2019! +

        +
        +

        We saw some of the main concepts of hypothesis testing introduced in Chapters 8 and 9. We will expand further on these ideas here and also provide a framework for understanding hypothesis tests in general. Instead of presenting you with lots of different formulas and scenarios, we hope to build a way to think about all hypothesis tests. You can then adapt to different scenarios as needed down the road when you encounter different statistical situations.

        The same can be said for confidence intervals. There was one general framework that applies to all confidence intervals and we elaborated on this using the infer package pipeline in Chapter 9. The specifics may change slightly for each variation, but the important idea is to understand the general framework so that you can apply it to more specific problems. We believe that this approach is much better in the long-term than teaching you specific tests and confidence intervals rigorously. You can find fully-worked out examples for five common hypothesis tests and their corresponding confidence intervals in Appendix B.

        We recommend that you carefully review these examples as they also cover how the general frameworks apply to traditional normal-based methodologies like the \(t\)-test and normal-theory confidence intervals. You’ll see there that these methods are just approximations for the general computational frameworks, but require conditions to be met for their results to be valid. The general frameworks using randomization, simulation, and bootstrapping do not hold the same sorts of restrictions and further advance computational thinking, which is one big reason for their emphasis throughout this textbook.

        @@ -597,7 +597,7 @@

        10.1 When inference is not needed

        To further understand just how different the air_time variable is for BOS and SFO, let’s look at a boxplot:

        ggplot(data = bos_sfo, mapping = aes(x = dest, y = air_time)) +
           geom_boxplot()
        -

        +

        Since there is no overlap at all, we can conclude that the air_time for San Francisco flights is statistically greater (at any level of significance) than the air_time for Boston flights. This is a clear example of not needing to do anything more than some simple exploratory data analysis with descriptive statistics and data visualization to get an appropriate inferential conclusion. This is one reason why you should ALWAYS investigate the sample data first using dplyr and ggplot2 via exploratory data analysis.

        As you get more and more practice with hypothesis testing, you’ll be better able to determine in many cases whether or not the results will be statistically significant. There are circumstances where it is difficult to tell, but you should always try to make a guess FIRST about significance after you have completed your data exploration and before you actually begin the inferential techniques.


        @@ -668,7 +668,7 @@

        10.4 Types of errors in hypothesi

      The risk of error is the price researchers pay for basing an inference about a population on a sample. With any reasonable sample-based procedure, there is some chance that a Type I error will be made and some chance that a Type II error will occur.

      To help understand the concepts of Type I error and Type II error, observe the following table:

      -
      +
      Type I and Type II errors

      FIGURE 10.2: Type I and Type II errors @@ -772,8 +772,8 @@

      10.7.2 Comparing action and roman

      Let’s now visualize the distributions of rating across both levels of genre. Think about what type(s) of plot is/are appropriate here before you proceed:

      ggplot(data = movies_trimmed, aes(x = genre, y = rating)) +
         geom_boxplot()
      -
      -Rating vs genre in the population +
      +Rating vs genre in the population

      FIGURE 10.3: Rating vs genre in the population

      @@ -815,8 +815,8 @@

      10.7.4 Data

      We can now observe the distributions of our two sample ratings for both groups. Remember that these plots should be rough approximations of our population distributions of movie ratings for "Action" and "Romance" in our population of all movies in the movies data frame.

      ggplot(data = movies_genre_sample, aes(x = genre, y = rating)) +
         geom_boxplot()
      -
      -Genre vs rating for our sample +
      +Genre vs rating for our sample

      FIGURE 10.5: Genre vs rating for our sample

      @@ -824,8 +824,8 @@

      10.7.4 Data

      ggplot(data = movies_genre_sample, mapping = aes(x = rating)) +
         geom_histogram(binwidth = 1, color = "white") +
         facet_grid(genre ~ .)
      -
      -Genre vs rating for our sample as faceted histogram +
      +Genre vs rating for our sample as faceted histogram

      FIGURE 10.6: Genre vs rating for our sample as faceted histogram

      @@ -932,8 +932,8 @@

      10.7.9 Distribution of A null distribution of simulated differences in sample means is created with the specification of stat = "diff in means" for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true.

      We can now plot the distribution of these simulated differences in means:

      null_distribution_two_means %>% visualize()
      -
      -Simulated differences in means histogram +
      +Simulated differences in means histogram

      FIGURE 10.7: Simulated differences in means histogram

      @@ -944,8 +944,8 @@

      10.7.10 The p-value

      Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \(p\)-value:

      null_distribution_two_means %>% 
         visualize(obs_stat = obs_diff, direction = "both")
      -
      -Shaded histogram to show p-value +
      +Shaded histogram to show p-value

      FIGURE 10.8: Shaded histogram to show p-value

      @@ -953,8 +953,8 @@

      10.7.10 The p-value

      Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \(p\)-value will be, we also increase the number of bins to 100 here from 20:

      null_distribution_two_means %>% 
         visualize(bins = 100, obs_stat = obs_diff, direction = "both")
      -
      -Histogram with vertical lines corresponding to observed statistic +
      +Histogram with vertical lines corresponding to observed statistic

      FIGURE 10.9: Histogram with vertical lines corresponding to observed statistic

      @@ -983,8 +983,9 @@

      10.7.11 Corresponding confidence # hypothesize(null = "independence") %>% generate(reps = 5000) %>% calculate(stat = "diff in means", order = c("Romance", "Action")) %>% - get_ci() -percentile_ci_two_means

      + get_ci()
      +
      Setting `type = "bootstrap"` in `generate()`.
      +
      percentile_ci_two_means
      # A tibble: 1 x 2
         `2.5%` `97.5%`
          <dbl>   <dbl>
      @@ -1024,10 +1025,10 @@ 

      10.8 Building theory-based method

      These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \(t\)-test or a \(z\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \(t\)-curve comes about without digging big into the mathematical underpinnings.

      10.8.1 Example: \(t\)-test for two independent samples

      -

      What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the \(z\)-score. The formula for a \(z\)-score is \[Z = \frac{x - \mu}{\sigma},\] where \(x\) represent the value of a variable, \(\mu\) represents the mean of the variable, and \(\sigma\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \(z\)-score that gives how many standard deviations away that value is from its mean. \(z\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.

      -

      +

      What is commonly done in statistics is the process of standardization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common standardization is known as the \(z\)-score. The formula for a \(z\)-score is \[Z = \frac{x - \mu}{\sigma},\] where \(x\) represent the value of a variable, \(\mu\) represents the mean of the variable, and \(\sigma\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \(z\)-score that gives how many standard deviations away that value is from its mean. \(z\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.

      +

      Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.

      -

      Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the \(t\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}} }\]

      +

      Another form of standardization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This standardization is often called the \(t\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}} }\]

      There is a lot to try to unpack here.

      null_slope_distn %>% 
         visualize(obs_stat = slope_obs, direction = "greater")
      -

      +

      In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize().

      @@ -651,8 +641,12 @@

      11.2 Bootstrapping for the regres calculate(stat = "slope")

      To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a "slope" coefficient for each of these 10,000 generated samples.

      If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:

      +
      bootstrap_slope_distn <- evals %>% 
      +  specify(score ~ bty_avg) %>%
      +  generate(reps = 10000, type = "bootstrap") %>% 
      +  calculate(stat = "slope")
      bootstrap_slope_distn %>% visualize()
      -

      +

      Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.

      percentile_slope_ci <- bootstrap_slope_distn %>% 
         get_ci(level = 0.99, type = "percentile")
      @@ -824,7 +818,7 @@ 

      11.3.3 Refresher: Regression tabl get_regression_table(score_model_3)

      @@ -947,7 +941,7 @@

      11.3.3 Refresher: Regression tabl

      -TABLE 11.2: Model 2: Regression table with interaction effect included +TABLE 11.2: Model 2: Regression table with interaction effect included
      -
      +

      11.3.4 Script of R code

      An R script file of all R code used in this chapter is available here.


      @@ -1086,7 +1080,7 @@

      11.4.2 Residual analysis

      arrange(lifeExp)
      @@ -1227,8 +1221,8 @@

      11.4.3 Residual analysis

      labs(x ="Income (in $1000)", y ="Residual", title ="Residuals vs income") -
      -Residuals vs credit limit and income +
      +Residuals vs credit limit and income

      FIGURE 11.9: Residuals vs credit limit and income

      @@ -1237,6 +1231,7 @@

      11.4.3 Residual analysis

      ggplot(regression_points, aes(x = residual)) +
         geom_histogram(color = "white") +
         labs(x = "Residual")
      +
      `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
      Relationship between credit card balance and credit limit/income

      diff --git a/docs/12-inference-for-regression.html b/docs/12-inference-for-regression.html deleted file mode 100644 index 8f50ea2b1..000000000 --- a/docs/12-inference-for-regression.html +++ /dev/null @@ -1,1410 +0,0 @@ - - - - - - - - Chapter 12 Inference for Regression | Statistical Inference via Data Science - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

      - -
      - -
      - -
      -
      - - -
      -
      - -
      - -ModernDive - -
      -

      Chapter 12 Inference for Regression

      -
      -
      -

      -Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. -

      -
      -/begin{center} r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”) /end{center} -
      -
      -
      -
      -

      Needed packages

      -

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages.

      -
      library(ggplot2)
      -library(dplyr)
      -library(moderndive)
      -library(infer)
      -library(gapminder)
      -library(ISLR)
      -
      -
      -
      -

      12.1 Simulation-based Inference for Regression

      -

      We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 11.1 using the infer package.

      -
      -

      12.1.1 Data

      -

      Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer.

      -
      evals %>% 
      -  specify(score ~ bty_avg)
      -
      Response: score (numeric)
      -Explanatory: bty_avg (numeric)
      -# A tibble: 463 x 2
      -   score bty_avg
      -   <dbl>   <dbl>
      - 1   4.7    5   
      - 2   4.1    5   
      - 3   3.9    5   
      - 4   4.8    5   
      - 5   4.6    3   
      - 6   4.3    3   
      - 7   2.8    3   
      - 8   4.1    3.33
      - 9   3.4    3.33
      -10   4.5    3.17
      -# … with 453 more rows
      -
      -
      -

      12.1.2 Test statistic \(\delta\)

      -

      Our test statistic here is the sample slope coefficient that we denote with \(b_1\).

      -
      -
      -

      12.1.3 Observed effect \(\delta^*\)

      -

      We can use the specify() %>% calculate() shortcut here to determine the slope value seen in our observed data:

      -
      slope_obs <- evals %>% 
      -  specify(score ~ bty_avg) %>% 
      -  calculate(stat = "slope")
      -

      The calculated slope value from our observed sample is \(b_1 = 0.067\).

      -
      -
      -

      12.1.4 Model of \(H_0\)

      -

      We are looking to see if a positive relationship exists so \(H_A: \beta_1 > 0\). Our null hypothesis is always in terms of equality so we have \(H_0: \beta_1 = 0\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.

      -
      -
      -

      12.1.5 Simulated data

      -

      Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \(\beta_1\) to be zero. If \(\beta_1 = 0\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.

      -

      Tactile simulation

      -

      We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.

      -

      Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.

      -
      -
      -

      12.1.6 Distribution of \(\delta\) under \(H_0\)

      -

      We can build our null distribution in much the same way we did in Chapter 11 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping.

      -
      null_slope_distn <- evals %>% 
      -  specify(score ~ bty_avg) %>%
      -  hypothesize(null = "independence") %>% 
      -  generate(reps = 10000) %>% 
      -  calculate(stat = "slope")
      -
      null_slope_distn %>% 
      -  visualize(obs_stat = slope_obs, direction = "greater")
      -

      -

      In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize().

      -
      -
      -

      12.1.7 The p-value

      -
      null_slope_distn %>% 
      -  get_pvalue(obs_stat = slope_obs, direction = "greater")
      -
      # A tibble: 1 x 1
      -  p_value
      -    <dbl>
      -1       0
      -

      Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \(p\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.

      -
      -

      -Learning check -

      -
      -

      (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = "correlation" in the calculate() function of the infer package.

      -
      - -
      -
      -
      -
      -
      -

      12.2 Bootstrapping for the regression slope

      -

      With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function.

      -
      null_slope_distn <- evals %>% 
      -  specify(score ~ bty_avg) %>%
      -  hypothesize(null = "independence") %>% 
      -  generate(reps = 10000, type = "permute") %>% 
      -  calculate(stat = "slope")
      -

      To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a "slope" coefficient for each of these 10,000 generated samples.

      -

      If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:

      -
      bootstrap_slope_distn %>% visualize()
      -

      -

      Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.

      -
      percentile_slope_ci <- bootstrap_slope_distn %>% 
      -  get_ci(level = 0.99, type = "percentile")
      -percentile_slope_ci
      -
      # A tibble: 1 x 2
      -  `0.5%` `99.5%`
      -   <dbl>   <dbl>
      -1 0.0229   0.110
      -
      se_slope_ci <- bootstrap_slope_distn %>% 
      -  get_ci(level = 0.99, type = "se", point_estimate = slope_obs)
      -se_slope_ci
      -
      # A tibble: 1 x 2
      -   lower upper
      -   <dbl> <dbl>
      -1 0.0220 0.111
      -

      With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.

      - -
      -
      -
      -

      12.3 Inference for multiple regression

      -
      -

      12.3.1 Refresher: Professor evaluations data

      -

      Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular

      -
        -
      • \(y\): outcome variable of instructor evaluation score
      • -
      • predictor variables -
          -
        • \(x_1\): numerical explanatory/predictor variable of age
        • -
        • \(x_2\): categorical explanatory/predictor variable of gender
        • -
      • -
      -
      library(ggplot2)
      -library(dplyr)
      -library(moderndive)
      -
      -evals_multiple <- evals %>%
      -  select(score, ethnicity, gender, language, age, bty_avg, rank)
      -

      First, recall that we had two competing potential models to explain professors’ -teaching scores:

      -
        -
      1. Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score
      2. -
      3. Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score
      4. -
      -
      -
      -

      12.3.2 Refresher: Visualizations

      -

      Recall the plots we made for both these models:

      -
      -Model 1: no interaction effect included -

      -FIGURE 12.1: Model 1: no interaction effect included -

      -
      -
      -Model 2: interaction effect included -

      -FIGURE 12.2: Model 2: interaction effect included -

      -
      -
      -
      -

      12.3.3 Refresher: Regression tables

      -

      Last, let’s recall the regressions we fit. First, the regression with no -interaction effect: note the use of + in the formula in Table 12.1.

      -
      score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
      -get_regression_table(score_model_2)
      -
      -TABLE 11.3: Countries in Asia with shortest life expectancy +TABLE 11.3: Countries in Asia with shortest life expectancy
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      -TABLE 12.1: Model 1: Regression table with no interaction effect included -
      -term - -estimate - -std_error - -statistic - -p_value - -lower_ci - -upper_ci -
      -intercept - -4.484 - -0.125 - -35.79 - -0.000 - -4.238 - -4.730 -
      -age - --0.009 - -0.003 - --3.28 - -0.001 - --0.014 - --0.003 -
      -gendermale - -0.191 - -0.052 - -3.63 - -0.000 - -0.087 - -0.294 -
      -

      Second, the regression with an interaction effect: note the use of * in the formula.

      -
      score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
      -get_regression_table(score_model_3)
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      -TABLE 12.2: Model 2: Regression table with interaction effect included -
      -term - -estimate - -std_error - -statistic - -p_value - -lower_ci - -upper_ci -
      -intercept - -4.883 - -0.205 - -23.80 - -0.000 - -4.480 - -5.286 -
      -age - --0.018 - -0.004 - --3.92 - -0.000 - --0.026 - --0.009 -
      -gendermale - --0.446 - -0.265 - --1.68 - -0.094 - --0.968 - -0.076 -
      -age:gendermale - -0.014 - -0.006 - -2.45 - -0.015 - -0.003 - -0.024 -
      -
      -
      -

      12.3.4 Script of R code

      -

      An R script file of all R code used in this chapter is available here.

      -
      -
      -
      -
      -

      12.4 Residual analysis

      -
      -

      12.4.1 Residual analysis

      -

      Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \(y\) and the fitted value \(\widehat{y}\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 12.4 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section.

      -

      We’ll perform our residual analysis in two ways:

      -
        -
      1. Creating a scatterplot with the residuals on the \(y\)-axis and the original explanatory variable \(x\) on the \(x\)-axis.
      2. -
      3. Creating a histogram of the residuals, thereby showing the distribution of the residuals.
      4. -
      -

      First, recall in Figure 6.8 above we created a scatterplot where

      -
        -
      • on the vertical axis we had the teaching score \(y\),
      • -
      • on the horizontal axis we had the beauty score \(x\), and
      • -
      • the blue arrow represented the residual for one particular instructor.
      • -
      -

      Instead, in Figure 12.3 below, let’s create a scatterplot where

      -
        -
      • On the vertical axis we have the residual \(y-\widehat{y}\) instead
      • -
      • On the horizontal axis we have the beauty score \(x\) as before:
      • -
      -
      # Get data
      -evals_ch6 <- evals %>%
      -  select(score, bty_avg, age)
      -# Fit regression model:
      -score_model <- lm(score ~ bty_avg, data = evals_ch6)
      -# Get regression table:
      -get_regression_table(score_model)
      -
      # A tibble: 2 x 7
      -  term      estimate std_error statistic p_value lower_ci upper_ci
      -  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
      -1 intercept    3.88      0.076     51.0        0    3.73     4.03 
      -2 bty_avg      0.067     0.016      4.09       0    0.035    0.099
      -
      # Get regression points
      -regression_points <- get_regression_points(score_model)
      -
      ggplot(regression_points, aes(x = bty_avg, y = residual)) +
      -  geom_point() +
      -  labs(x = "Beauty Score", y = "Residual") +
      -  geom_hline(yintercept = 0, col = "blue", size = 1)
      -
      -Plot of residuals over beauty score -

      -FIGURE 12.3: Plot of residuals over beauty score -

      -
      -

      You can think of Figure 12.3 as Figure 6.8 but with the blue line flattened out to \(y=0\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals.

      -

      Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition,

      -
        -
      1. the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \(y - \widehat{y} > 0\), sometimes the regression model will make a negative error in that \(y - \widehat{y} < 0\), but on average the error is 0.
      2. -
      3. Further, the value and spread of the residuals should not depend on the value of \(x\).
      4. -
      -

      In Figure 12.4 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \(x\): the residuals tend to be positive for small and large values of \(x\) in this range, whereas values of \(x\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \(x\), the spread of the residuals varies for different values of \(x\); this situation is known as heteroskedasticity.

      -
      -Examples of less than ideal residual patterns -

      -FIGURE 12.4: Examples of less than ideal residual patterns -

      -
      -

      The second way to perform a residual analysis is to look at the histogram of the residuals:

      -
      ggplot(regression_points, aes(x = residual)) +
      -  geom_histogram(binwidth = 0.25, color = "white") +
      -  labs(x = "Residual")
      -
      -Histogram of residuals -

      -FIGURE 12.5: Histogram of residuals -

      -
      -

      This histogram seems to indicate that we have more positive residuals than negative. Since the residual \(y-\widehat{y}\) is positive when \(y > \widehat{y}\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:

      -
      -Examples of ideal and less than ideal residual patterns -

      -FIGURE 12.6: Examples of ideal and less than ideal residual patterns -

      -
      -

      In fact, we’ll see later on that we would like the residuals to be normally distributed with -mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 12 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold.

      -
      -

      -Learning check -

      -
      -

      (LC11.2) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.

      -
      - -
      -
      -
      -

      12.4.2 Residual analysis

      -
      # Get data:
      -gapminder2007 <- gapminder %>%
      -  filter(year == 2007) %>% 
      -  select(country, continent, lifeExp, gdpPercap)
      -# Fit regression model:
      -lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
      -# Get regression table:
      -get_regression_table(lifeExp_model)
      -
      # A tibble: 5 x 7
      -  term              estimate std_error statistic p_value lower_ci upper_ci
      -  <chr>                <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
      -1 intercept             54.8      1.02     53.4        0     52.8     56.8
      -2 continentAmericas     18.8      1.8      10.4        0     15.2     22.4
      -3 continentAsia         15.9      1.65      9.68       0     12.7     19.2
      -4 continentEurope       22.8      1.70     13.5        0     19.5     26.2
      -5 continentOceania      25.9      5.33      4.86       0     15.4     36.4
      -
      # Get regression points
      -regression_points <- get_regression_points(lifeExp_model)
      -

      Recall our discussion on residuals from Section 12.4.1 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.

      -
        -
      1. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis
      2. -
      3. A histogram of all residuals
      4. -
      -

      First, let’s plot the residuals versus continent in Figure 12.7, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter():

      -
      ggplot(regression_points, aes(x = continent, y = residual)) +
      -  geom_jitter(width = 0.1) + 
      -  labs(x = "Continent", y = "Residual") +
      -  geom_hline(yintercept = 0, col = "blue")
      -
      -Plot of residuals over continent -

      -FIGURE 12.7: Plot of residuals over continent -

      -
      -

      We observe

      -
        -
      1. There seems to be a rough balance of both positive and negative residuals for all 5 continents.
      2. -
      3. However, there is one clear outlier in Asia, which has a residual with the largest deviation away from 0.
      4. -
      -

      Let’s investigate the 5 countries in Asia with the shortest life expectancy:

      -
      gapminder2007 %>%
      -  filter(continent == "Asia") %>%
      -  arrange(lifeExp)
      - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
      -TABLE 12.3: Countries in Asia with shortest life expectancy -
      -country - -continent - -lifeExp - -gdpPercap -
      -Afghanistan - -Asia - -43.8 - -975 -
      -Iraq - -Asia - -59.5 - -4471 -
      -Cambodia - -Asia - -59.7 - -1714 -
      -Myanmar - -Asia - -62.1 - -944 -
      -Yemen, Rep. - -Asia - -62.7 - -2281 -
      -

      This was the earlier identified residual for Afghanistan of -26.9. Unfortunately -given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a -drastically lower life expectancy.

      -

      Second, let’s look at a histogram of all 142 values of -residuals in Figure 12.8. In this case, the residuals form a -rather nice bell-shape, although there are a couple of very low and very high -values at the tails. As we said previously, searching for patterns in residuals -can be somewhat subjective, but ideally we hope there are no “drastic” patterns.

      -
      ggplot(regression_points, aes(x = residual)) +
      -  geom_histogram(binwidth = 5, color = "white") +
      -  labs(x = "Residual")
      -
      -Histogram of residuals -

      -FIGURE 12.8: Histogram of residuals -

      -
      -
      -

      -Learning check -

      -
      -

      (LC11.3) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.

      -
      - -
      -
      -
      -

      12.4.3 Residual analysis

      -

      Recall in Section 12.4.1, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice:

      -
      # Get data:
      -Credit <- Credit %>%
      -  select(Balance, Limit, Income, Rating, Age)
      -# Fit regression model:
      -Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
      -# Get regression table:
      -get_regression_table(Balance_model)
      -
      # A tibble: 3 x 7
      -  term      estimate std_error statistic p_value lower_ci upper_ci
      -  <chr>        <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
      -1 intercept -385.       19.5       -19.8       0 -423.    -347.   
      -2 Limit        0.264     0.006      45.0       0    0.253    0.276
      -3 Income      -7.66      0.385     -19.9       0   -8.42    -6.91 
      -
      # Get regression points
      -regression_points <- get_regression_points(Balance_model)
      -
      ggplot(regression_points, aes(x = Limit, y = residual)) +
      -  geom_point() +
      -  labs(x = "Credit limit (in $)", 
      -       y = "Residual", 
      -       title = "Residuals vs credit limit")
      -  
      -ggplot(regression_points, aes(x = Income, y = residual)) +
      -  geom_point() +
      -  labs(x = "Income (in $1000)", 
      -       y = "Residual", 
      -       title = "Residuals vs income")
      -
      -Residuals vs credit limit and income -

      -FIGURE 12.9: Residuals vs credit limit and income -

      -
      -

      In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \(y=0\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 12.10. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.

      -
      ggplot(regression_points, aes(x = residual)) +
      -  geom_histogram(color = "white") +
      -  labs(x = "Residual")
      -
      -Relationship between credit card balance and credit limit/income -

      -FIGURE 12.10: Relationship between credit card balance and credit limit/income -

      -
      -

      Another way to interpret this histogram is that since the residual is computed as \(y - \widehat{y}\) = balance - balance_hat, we have some values where the fitted value \(\widehat{y}\) is very much lower than the observed value \(y\). In other words, we are underestimating certain credit card holders’ balances by a very large amount.

      -
      -

      -Learning check -

      -
      -

      (LC11.4) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.

      -
      - -
      -
      -
      -

      12.4.4 Residual analysis

      -
      # Get data:
      -evals_ch7 <- evals %>%
      -  select(score, age, gender)
      -# Fit regression model:
      -score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
      -# Get regression table:
      -get_regression_table(score_model_2)
      -
      # A tibble: 3 x 7
      -  term       estimate std_error statistic p_value lower_ci upper_ci
      -  <chr>         <dbl>     <dbl>     <dbl>   <dbl>    <dbl>    <dbl>
      -1 intercept     4.48      0.125     35.8    0        4.24     4.73 
      -2 age          -0.009     0.003     -3.28   0.001   -0.014   -0.003
      -3 gendermale    0.191     0.052      3.63   0        0.087    0.294
      -
      # Get regression points
      -regression_points <- get_regression_points(score_model_2)
      -

      As always, let’s perform a residual analysis first with a histogram, which we can facet by gender:

      -
      ggplot(regression_points, aes(x = residual)) +
      -  geom_histogram(binwidth = 0.25, color = "white") +
      -  labs(x = "Residual") +
      -  facet_wrap(~gender)
      -
      -Interaction model histogram of residuals -

      -FIGURE 12.11: Interaction model histogram of residuals -

      -
      -

      Second, the residuals as compared to the predictor variables:

      -
        -
      • \(x_1\): numerical explanatory/predictor variable of age
      • -
      • \(x_2\): categorical explanatory/predictor variable of gender
      • -
      -
      ggplot(regression_points, aes(x = age, y = residual)) +
      -  geom_point() +
      -  labs(x = "age", y = "Residual") +
      -  geom_hline(yintercept = 0, col = "blue", size = 1) +
      -  facet_wrap(~ gender)
      -
      -Interaction model residuals vs predictor -

      -FIGURE 12.12: Interaction model residuals vs predictor -

      -
      - -
      -
      -

      - - - - - -
      -
      -
      - - -
      -
      - - - - - - - - - - - - - - diff --git a/docs/12-thinking-with-data.html b/docs/12-thinking-with-data.html index 01c8e552e..86c503036 100644 --- a/docs/12-thinking-with-data.html +++ b/docs/12-thinking-with-data.html @@ -6,20 +6,20 @@ Chapter 12 Thinking with Data | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
    • 4.5 mutate existing variables
    • 4.6 arrange and sort rows
    • 4.7 join data frames
    • 4.8 Other verbs
    • 12.2 Case study: Effective data storytelling

      -

      Note: This section is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book.

      +
      -/begin{center} r include_image(path = “images/sign-2408065_1920.png”, html_opts=“height=100px”, latex_opts = “width=20%”) /end{center} +
      -

      As we’ve progressed throughout this book, you’ve seen how to work with data in a variety of ways. You’ve learned effective strategies for plotting data by understanding which types of plots work best for which combinations of variable types. You’ve summarized data in table form and calculated summary statistics for a variety of different variables. Further, you’ve seen the value of inference as a process to come to conclusions about a population by using a random sample. Lastly, you’ve explored how to use linear regression and the importance of checking the conditions required to make it a valid procedure. All throughout, you’ve learned many computational techniques and focused on reproducible research in writing R code. We now present another case study, but this time of the “effective data storytelling” done by data journalists around the world. Great data stories don’t mislead the reader, but rather engulf them in understanding the importance that data plays in our lives through the captivation of storytelling.

      @@ -1058,14 +1050,14 @@

      12.2.2 US Births in 1999

      ggplot(US_births_1999, aes(x = date, y = births)) +
         geom_line() +
         labs(x = "Data", y = "Number of births", title = "US Births in 1999")
      -

      +

      We see a big valley occurring just before January 1st, 2000, mostly likely due to the holiday season. However, what about the major peak of over 14,000 births occurring just before October 1st, 1999? What could be the reason for this anomalously high spike in ? Time to think with data!

      12.2.3 Other examples

      Stand by!

      -
      +

      12.2.4 Script of R code

      An R script file of all R code used in this chapter is available here.

      diff --git a/docs/2-getting-started.html b/docs/2-getting-started.html index 597e7ed2c..7a092a09e 100644 --- a/docs/2-getting-started.html +++ b/docs/2-getting-started.html @@ -6,20 +6,20 @@ Chapter 2 Getting Started with Data in R | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
    • 4.5 mutate existing variables
    • 4.6 arrange and sort rows
    • 4.7 join data frames
    • 4.8 Other verbs
      • 4.8.1 select variables
      • @@ -232,26 +233,24 @@
      • 5 Data Importing & “Tidy” Data
      • B Inference Examples
        • Needed packages
        • @@ -928,7 +921,7 @@

          2.5.1 Additional resources

          2.5.2 What’s to come?

          As we stated earlier however, the best way to learn R is to learn by doing. We now start the “data science” portion of the book in Chapter 3 with what we feel is the most important tool in a data scientist’s toolbox: data visualization. We will continue to explore the data included in the nycflights13 package through data visualization. We’ll see that data visualization is a powerful tool to add to our toolbox for data exploring that provides additional insight to what the View() and glimpse() functions can provide.

          -
          +
          ModernDive flowchart

          FIGURE 2.1: ModernDive flowchart diff --git a/docs/3-viz.html b/docs/3-viz.html index b3b177156..23f6b98f5 100644 --- a/docs/3-viz.html +++ b/docs/3-viz.html @@ -6,20 +6,20 @@ Chapter 3 Data Visualization | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@

        • 4.5 mutate existing variables
        • 4.6 arrange and sort rows
        • 4.7 join data frames
        • 4.8 Other verbs
          • 4.8.1 select variables
          • @@ -232,26 +233,24 @@
          • 5 Data Importing & “Tidy” Data
          • B Inference Examples
            • Needed packages
            • @@ -539,7 +532,7 @@

              Chapter 3 Data Visualization

              We begin the development of your data science toolbox with data visualization. By visualizing our data, we gain valuable insights that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots. ggplot2 is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005).

              -

              At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasise the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience.

              +

              At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). Graphics should be designed to emphasize the findings and insight you want your audience to understand. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible; on the other you don’t want to include so many as to overwhelm your audience.

              As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable.

              Needed packages

              @@ -551,7 +544,7 @@

              Needed packages

              3.1 The Grammar of Graphics

              -

              We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the ggplot2 package. Think of how we construct sentences in english to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for contructing statistical graphics by combining different types of layers. This grammar was created by Leland Wilkinson (Wilkinson 2005) and has been implemented in a variety of data visualization software including R.

              +

              We begin with a discussion of a theoretical framework for data visualization known as “The Grammar of Graphics,” which serves as the foundation for the ggplot2 package. Think of how we construct sentences in English to form sentences by combining different elements, like nouns, verbs, particles, subjects, objects, etc. However, we can’t just combine these elements in any arbitrary order; we must do so following a set of rules known as a linguistic grammar. Similarly to a linguistic grammar, “The Grammar of Graphics” define a set of rules for constructing statistical graphics by combining different types of layers. This grammar was created by Leland Wilkinson (Wilkinson 2005) and has been implemented in a variety of data visualization software including R.

              3.1.1 Components of the Grammar

              In short, the grammar tells us that:

              @@ -803,7 +796,7 @@

              3.1.3 Other components

              - `coord`inate system for x/y values: typically `cartesian`, but can also be `map` or `polar`. - `stat`istical transformations: this includes smoothing, binning values into a histogram, or no transformation at all (known as the `"identity"` transformation). --> -

              Other more complex components like scales and coordinate systems are left for a more advanced text such as R for Data Science (Grolemund and Wickham 2016). Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifiying them.

              +

              Other more complex components like scales and coordinate systems are left for a more advanced text such as R for Data Science (Grolemund and Wickham 2016). Generally speaking, the Grammar of Graphics allows for a high degree of customization of plots and also a consistent framework for easily updating and modifying them.

              3.1.4 ggplot2 package

              @@ -1044,7 +1037,7 @@

              3.4.1 Linegraphs via geom_line

              3.4.2 Summary

              -

              Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use lingraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.

              +

              Linegraphs, just like scatterplots, display the relationship between two numerical variables. However it is preferred to use linegraphs over scatterplots when the variable on the x-axis (i.e. the explanatory variable) has an inherent ordering, like some notion of time.


              @@ -1087,7 +1080,7 @@

              3.5 5NG#3: Histograms

              The remaining bins all have a similar interpretation.

              3.5.1 Histograms via geom_histogram

              -

              Let’s now present the ggplot() code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in aes(): the single numerical variable temp. The y-aesthetic of a histogram gets computed for you automatically. Furthemore, the geometric object layer is now a geom_histogram()

              +

              Let’s now present the ggplot() code to plot your first histogram! Unlike with scatterplots and linegraphs, there is now only one variable being mapped in aes(): the single numerical variable temp. The y-aesthetic of a histogram gets computed for you automatically. Furthermore, the geometric object layer is now a geom_histogram()

              ggplot(data = weather, mapping = aes(x = temp)) +
                 geom_histogram()
              `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
              @@ -1128,17 +1121,17 @@

              3.5.2 Adjusting the bins

              Using the first method, we have the power to specify how many bins we would like to cut the x-axis up in. As mentioned in the previous section, the default number of bins is 30. We can override this default, to say 40 bins, as follows:

              ggplot(data = weather, mapping = aes(x = temp)) +
                 geom_histogram(bins = 40, color = "white")
              -
              -Histogram with 60 bins. +
              +Histogram with 40 bins.

              -FIGURE 3.14: Histogram with 60 bins. +FIGURE 3.14: Histogram with 40 bins.

              Using the second method, instead of specifying the number of bins, we specify the width of the bins by using the binwidth argument in the geom_histogram() layer. For example, let’s set the width of each bin to be 10°F.

              ggplot(data = weather, mapping = aes(x = temp)) +
                 geom_histogram(binwidth = 10, color = "white")
              -
              -Histogram with binwidth 10. +
              +Histogram with binwidth 10.

              FIGURE 3.15: Histogram with binwidth 10.

              @@ -1164,7 +1157,7 @@

              3.5.3 Summary

              3.6 Facets

              -

              Before continuing the 5NG, let’s briefly introduce a new concept called faceting. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create mutiple copies of the same type of plot with matching x and y axes, but whose content will differ.

              +

              Before continuing the 5NG, let’s briefly introduce a new concept called faceting. Faceting is used when we’d like to split a particular visualization of variables by another variable. This will create multiple copies of the same type of plot with matching x and y axes, but whose content will differ.

              For example, suppose we were interested in looking at how the histogram of hourly temperature recordings at the three NYC airports we saw in Section 3.5 differed by month. We would “split” this histogram by the 12 possible months in a given year, in other words plot histograms of temp for each month. We do this by adding facet_wrap(~ month) layer.

              ggplot(data = weather, mapping = aes(x = temp)) +
                 geom_histogram(binwidth = 5, color = "white") +
              @@ -1175,7 +1168,7 @@ 

              3.6 Facets

              FIGURE 3.16: Faceted histogram.

              -

              Note the use of the tilde ~ before month in facet_wrap(). The tilde is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. We can also specify the number of rows and columns in the grid by using the nrow and ncol arguments inside of facet_wrap(). For example, say we would like our facetted plot to have 4 rows instead of 3. Add the nrow = 4 argument to facet_wrap(~ month)

              +

              Note the use of the tilde ~ before month in facet_wrap(). The tilde is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here. We can also specify the number of rows and columns in the grid by using the nrow and ncol arguments inside of facet_wrap(). For example, say we would like our faceted plot to have 4 rows instead of 3. Add the nrow = 4 argument to facet_wrap(~ month)

              ggplot(data = weather, mapping = aes(x = temp)) +
                 geom_histogram(binwidth = 5, color = "white") +
                 facet_wrap(~ month, nrow = 4)
              @@ -1286,7 +1279,7 @@

              3.7.1 Boxplots via geom_boxplot

              (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.

              (LC3.23) Which months have the highest variability in temperature? What reasons can you give for this?

              -

              (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?

              +

              (LC3.24) We looked at the distribution of the numerical variable temp split by the numerical variable month that we converted to a categorical variable using the factor() function. Why would a boxplot of temp split by the numerical variable pressure similarly converted to a categorical variable using the factor() not be informative?

              (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?

              @@ -1540,7 +1533,7 @@

              3.8.2 Must avoid pie charts!

            While it is quite difficult to answer these questions when looking at the pie chart in Figure 3.27, we can much more easily answer these questions using the barchart in Figure 3.26. This is true since barplots present the information in a way such that comparisons between categories can be made with single horizontal lines, whereas pie charts present the information in a way such that comparisons between categories must be made by comparing angles.

            There may be one exception of a pie chart not to avoid courtesy Nathan Yau at FlowingData.com, but we will leave this for the reader to decide:

            -
            +
            The only good pie chart

            FIGURE 3.28: The only good pie chart @@ -1562,7 +1555,7 @@

            3.8.3 Two categorical variablesBarplots are the go-to way to visualize the frequency of different categories, or levels, of a single categorical variable. Another use of barplots is to visualize the joint distribution of two categorical variables at the same time. Let’s examine the joint distribution of outgoing domestic flights from NYC by carrier and origin, or in other words the number of flights for each carrier and origin combination. For example, the number of WestJet flights from JFK, the number of WestJet flights from LGA, the number of WestJet flights from EWR, the number of American Airlines flights from JFK, and so on. Recall the ggplot() code that created the barplot of carrier frequency in Figure 3.26:

            ggplot(data = flights, mapping = aes(x = carrier)) +
               geom_bar()
            -

            +

            We can now map the additional variable origin by adding a fill = origin inside the aes() aesthetic mapping; the fill aesthetic of any bar corresponds to the color used to fill the bars.

            ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
               geom_bar()
            @@ -1598,8 +1591,8 @@

            3.8.3 Two categorical variablesAnother alternative to stacked barplots are side-by-side barplots, also known as a dodged barplot. The code to created a side-by-side barplot is identical to the code to create a stacked barplot, but with a position = "dodge" argument added to geom_bar(). In other words, we are overriding the default barplot type, which is a stacked barplot, and specifying it to be a side-by-side barplot.

            ggplot(data = flights, mapping = aes(x = carrier, fill = origin)) +
               geom_bar(position = "dodge")
            -
            -Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin. +
            +Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin.

            FIGURE 3.31: Side-by-side AKA dodged barplot comparing the number of flights by carrier and origin.

            @@ -1805,7 +1798,7 @@

            3.9.4 What’s to come

            ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) + geom_line()
            -

            These two code segments were a preview of Chapter 4 on data wrangling where we’ll delve further into the dplyr package. Data wrangling is the process of transforming and modifying existing data to with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the filter() function to create new data frames (alaska_flights and early_january_weather) by choosing only a subset of rows of existing data frames (flights and weather). In this next chapter, we’ll formally introduce the filter() and other data wrangling functions as well as the pipe operator %>% which allows you to combine multiple data wrangling actions into a single sequential chain of actions. On to Chapter 4 on data wrangling!

            +

            These two code segments were a preview of Chapter 4 on data wrangling where we’ll delve further into the dplyr package. Data wrangling is the process of transforming and modifying existing data with the intent of making it more appropriate for analysis purposes. For example, the two code segments used the filter() function to create new data frames (alaska_flights and early_january_weather) by choosing only a subset of rows of existing data frames (flights and weather). In this next chapter, we’ll formally introduce the filter() and other data wrangling functions as well as the pipe operator %>% which allows you to combine multiple data wrangling actions into a single sequential chain of actions. On to Chapter 4 on data wrangling!

            diff --git a/docs/4-wrangling.html b/docs/4-wrangling.html index 1fff7858d..372aee794 100644 --- a/docs/4-wrangling.html +++ b/docs/4-wrangling.html @@ -6,20 +6,20 @@ Chapter 4 Data Wrangling | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
          • 4.5 mutate existing variables
          • 4.6 arrange and sort rows
          • 4.7 join data frames
          • 4.8 Other verbs
            • 4.8.1 select variables
            • @@ -232,26 +233,24 @@
            • 5 Data Importing & “Tidy” Data
            • B Inference Examples
              • Needed packages
              • @@ -538,7 +531,7 @@

                Chapter 4 Data Wrangling

                -

                So far in our journey, we’ve seen how to look at data saved in data frames using the glimpse() and View() functions in Chapter 2 on and how to create data visualizations using the ggplot2 package in Chapter 3. In particular we study what we term the “five named graphs” (5NG):

                +

                So far in our journey, we’ve seen how to look at data saved in data frames using the glimpse() and View() functions in Chapter 2 on and how to create data visualizations using the ggplot2 package in Chapter 3. In particular we studied what we term the “five named graphs” (5NG):

                1. scatterplots via geom_point()
                2. linegraphs via geom_line()
                3. @@ -546,8 +539,8 @@

                  Chapter 4 Data Wrangling

                4. histograms via geom_histogram()
                5. barplots via geom_bar() or geom_col()
                -

                We created these visualization using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of the above 5 geometric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure 3.1.

                -

                Furthermore in Section 3.9.4 we discussed that for two of our visualizations, we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay only for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the flights data frame to a new data frame alaska_flights consisting of only carrier == AS flights using the filter() function.

                +

                We created these visualizations using the “Grammar of Graphics”, which maps variables in a data frame to the aesthetic attributes of one the above 5 geometric objects. We can also control other aesthetic attributes of the geometric objects such as the size and color as seen in the Gapminder data example in Figure 3.1.

                +

                Recall however in Section 3.9.4 we discussed that for two of our visualizations we needed transformed/modified versions of existing data frames. Recall for example the scatterplot of departure and arrival delay only for Alaska Airlines flights. In order to create this visualization, we needed to first pare down the flights data frame to a new data frame alaska_flights consisting of only carrier == "AS" flights using the filter() function.

                alaska_flights <- flights %>% 
                   filter(carrier == "AS")
                 
                @@ -556,13 +549,14 @@ 

                Chapter 4 Data Wrangling

                In this chapter, we’ll introduce a series of functions from the dplyr package that will allow you to take a data frame and

                1. filter() its existing rows to only pick out a subset of them. For example, the alaska_flights data frame above.
                2. -
                3. summarize() one of its columns/variables with a summary statistic. For example, the median and interquartile range of temperatures as we saw in Section 3.7 on boxplots.
                4. -
                5. group_by() its rows. In other words assign different rows to be part of the same group and thus report summary statistics for each group separately. For example, perhaps you want not the overall average departure delay dep_delay for all three origin airports combined, but the average departure delay for each of the three origin airports separately.
                6. +
                7. summarize() one of its columns/variables with a summary statistic. Examples include the median and interquartile range of temperatures as we saw in Section 3.7 on boxplots.
                8. +
                9. group_by() its rows. In other words assign different rows to be part of the same group and report summary statistics for each group separately. For example, say perhaps you don’t want a single overall average departure delay dep_delay for all three origin airports combined, but rather three separate average departure delays, one for each of the three origin airports.
                10. mutate() its existing columns/variables to create new ones. For example, convert hourly temperature recordings from °F to °C.
                11. arrange() its rows. For example, sort the rows of weather in ascending or descending order of temp.
                12. join() it with another data frame by matching along a “key” variable. In other words, merge these two data frames together.
                -

                Notice how we used computer code type font to describe the actions we want to take on our data frames. This is because the dplyr package have intuitively verb-named functions that are easy to remember. We’ll start by introducing the pipe operator %>%, which allows you to combine multiple data wrangling verb-named functions into a single sequential chain of actions.

                +

                Notice how we used computer code font to describe the actions we want to take on our data frames. This is because the dplyr package for data wrangling that we’ll introduce in this chapter has intuitively verb-named functions that are easy to remember.

                +

                We’ll start by introducing the pipe operator %>%, which allows you to combine multiple data wrangling verb-named functions into a single sequential chain of actions.

                Needed packages

                Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages.

                @@ -573,7 +567,7 @@

                Needed packages

                4.1 The pipe operator: %>%

                -

                Before we dig into data wrangling, let’s first introduce a very nifty tool that gets loaded along with the dplyr package: the pipe operator %>%. Let’s say you would like to perform this sequence of operations in R:

                +

                Before we start data wrangling, let’s first introduce a very nifty tool that gets loaded along with the dplyr package: the pipe operator %>%. Say you would like to perform a hypothetical sequence of operations on a hypothetical data frame x using hypothetical functions f(), g(), and h():

                1. Take x then
                2. Use x as an input to a function f() then
                3. @@ -582,7 +576,7 @@

                  4.1 The pipe operator: %>

                One way to achieve this sequence of operations is by using nesting parentheses as follows:

                h(g(f(x)))
                -

                In this case, the above code isn’t so hard to read since we are applying only three functions: f(), then g(), then h(). However, you can imagine this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator %>% (pronounced “then”) comes in handy. %>% takes one output of one function and then “pipes” it to be the input of the next function. For example: you can obtain the same output as the above sequence of operations as follows:

                +

                The above code isn’t so hard to read since we are applying only three functions: f(), then g(), then h(). However, you can imagine that this can get progressively harder and harder to read as the number of functions applied in your sequence increases. This is where the pipe operator %>% comes in handy. %>% takes one output of one function and then “pipes” it to be the input of the next function. Furthermore, a helpful trick is to read %>% as “then.” For example, you can obtain the same output as the above sequence of operations as follows:

                x %>% 
                   f() %>% 
                   g() %>% 
                @@ -594,7 +588,7 @@ 

                4.1 The pipe operator: %>
              • Use this output as the input to the next function g() then
              • Use this output as the input to the next function h()
              • -

                So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are x, f(), g(), and h()? Throughout this chapter on data wrangling:

                +

                So while both approaches above would achieve the same goal, the latter is much more human-readable because you can read the sequence of operations line-by-line. But what are the hypothetical x, f(), g(), and h()? Throughout this chapter on data wrangling:

                • The starting value x will be a data frame. For example: flights.
                • The sequence of functions, here f(), g(), and h(), will be a sequence of any number of the 6 data wrangling verb-named functions we listed in the introduction to this chapter. For example: filter(carrier == "AS").
                • @@ -603,7 +597,7 @@

                  4.1 The pipe operator: %>

                  Much like when adding layers to a ggplot() using the + sign at the end of lines, you form a single chain of data wrangling operations by combining verb-named functions into a single sequence with pipe operators %>% at the end of lines. So continuing our example involving Alaska Airlines flights, we form a chain using the pipe operator %>% and save the resulting data frame in alaska_flights:

                  alaska_flights <- flights %>% 
                     filter(carrier == "AS")
                  -

                  Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section 4.8. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks.

                  +

                  Keep in mind, there are many more advanced data wrangling functions than just the 6 listed in the introduction to this chapter; you’ll see some examples of these near in Section 4.8. However, just with these 6 verb-named functions you’ll be able to perform a broad array of data wrangling tasks for the rest of this book.


                @@ -614,7 +608,7 @@

                4.2 filter rows

                FIGURE 4.1: Diagram of

                -

                The filter() function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about values of a variable in your dataset and then chooses only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is "PDX". Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:

                +

                The filter() function here works much like the “Filter” option in Microsoft Excel; it allows you to specify criteria about the values of a variables in your dataset and then filters out only those rows that match that criteria. We begin by focusing only on flights from New York City to Portland, Oregon. The dest code (or airport code) for Portland, Oregon is "PDX". Run the following and look at the resulting spreadsheet to ensure that only flights heading to Portland are chosen here:

                portland_flights <- flights %>% 
                   filter(dest == "PDX")
                 View(portland_flights)
                @@ -622,52 +616,63 @@

                4.2 filter rows

                • The ordering of the commands:
                    -
                  • Take the data frame flights then
                  • +
                  • Take the flights data frame flights then
                  • filter the data frame so that only those where the dest equals "PDX" are included.
                • -
                • The double equal sign == for testing for equality, and not =. You are almost guaranteed to make the mistake at least once of only including one equals sign.
                • -
                -

                You can combine multiple criteria together using operators that make comparisons:

                -
                  -
                • | corresponds to “or”
                • -
                • & corresponds to “and”
                • +
                • We test for equality using the double equal sign == and not a single equal sign =. In other words filter(dest = "PDX") will yield an error. This is a convention across many programming languages. If you are new to coding, you’ll probably forget to use the double equal sign == a few times before you get the hang of it.
                -

                We can often skip the use of & and just separate our conditions with a comma. You’ll see this in the example below.

                -

                In addition, you can use other mathematical checks (similar to ==):

                +

                You can use other mathematical operations beyond just == to form criteria:

                • > corresponds to “greater than”
                • < corresponds to “less than”
                • >= corresponds to “greater than or equal to”
                • <= corresponds to “less than or equal to”
                • -
                • != corresponds to “not equal to”
                • +
                • != corresponds to “not equal to”. The ! is used in many programming languages to indicate “not”.
                -

                To see many of these in action, let’s select all flights that left JFK airport heading to Burlington, Vermont ("BTV") or Seattle, Washington ("SEA") in the months of October, November, or December. Run the following

                +

                Furthermore, you can combine multiple criteria together using operators that make comparisons:

                +
                  +
                • | corresponds to “or”
                • +
                • & corresponds to “and”
                • +
                +

                To see many of these in action, let’s filter flights for all rows that:

                +
                  +
                • Departed from JFK airport and
                • +
                • Were heading to Burlington, Vermont ("BTV") or Seattle, Washington ("SEA") and
                • +
                • Departed in the months of October, November, or December.
                • +
                +

                Run the following:

                btv_sea_flights_fall <- flights %>% 
                -  filter(origin == "JFK", 
                -         dest == "BTV" | dest == "SEA", 
                -         month >= 10)
                +  filter(origin == "JFK" & (dest == "BTV" | dest == "SEA") & month >= 10)
                 View(btv_sea_flights_fall)
                -

                Note: even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer logical operations, we really mean “all flights leaving Burlington, Vermont or Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time.

                -

                Another example uses the ! to pick rows that don’t match a condition. The ! can be read as “not.” Here we are selecting rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.

                +

                Note that even though colloquially speaking one might say “all flights leaving Burlington, Vermont and Seattle, Washington,” in terms of computer operations, we really mean “all flights leaving Burlington, Vermont or leaving Seattle, Washington.” For a given row in the data, dest can be “BTV”, “SEA”, or something else, but not “BTV” and “SEA” at the same time. Furthermore, note the careful use of parentheses around the dest == "BTV" | dest == "SEA".

                +

                We can often skip the use of & and just separate our conditions with a comma. In other words the code above will return the identical output btv_sea_flights_fall as this code below:

                +
                btv_sea_flights_fall <- flights %>% 
                +  filter(origin == "JFK", (dest == "BTV" | dest == "SEA"), month >= 10)
                +View(btv_sea_flights_fall)
                +

                Let’s present another example that uses the ! “not” operator to pick rows that don’t match a criteria. As mentioned earlier, the ! can be read as “not.” Here we are filtering rows corresponding to flights that didn’t go to Burlington, VT or Seattle, WA.

                not_BTV_SEA <- flights %>% 
                   filter(!(dest == "BTV" | dest == "SEA"))
                 View(not_BTV_SEA)
                +

                Again, note the careful use of parentheses around the (dest == "BTV" | dest == "SEA"). If we didn’t use parentheses as follows:

                +
                flights %>% 
                +  filter(!dest == "BTV" | dest == "SEA")
                +

                We would be returning all flights not headed to "BTV" or those headed to "SEA", which is an entirely different resulting data frame.

                Now say we have a large list of airports we want to filter for, say BTV, SEA, PDX, SFO, and BDL. We could continue to use the | or operator as so:

                many_airports <- flights %>% 
                   filter(dest == "BTV" | dest == "SEA" | dest == "PDX" | dest == "SFO" | dest == "BDL")
                 View(many_airports)
                -

                but as we progressively include more airports, this will get unwieldly. A slightly shorter approach uses the %in% operator:

                +

                but as we progressively include more airports, this will get unwieldy. A slightly shorter approach uses the %in% operator:

                many_airports <- flights %>% 
                   filter(dest %in% c("BTV", "SEA", "PDX", "SFO", "BDL"))
                 View(many_airports)
                -

                What this code is doing is its filtering for all flights where dest is in the list of airports c("BTV", "SEA", "PDX", "SFO", "BDL"). Both outputs of many_airports are the same, but as you can see the latter takes much less time to code.

                -

                As a final note we point out that filter() should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope to just the observations your care about.

                +

                What this code is doing is filtering flights for all flights where dest is in the list of airports c("BTV", "SEA", "PDX", "SFO", "BDL"). Recall from Chapter 2 that the c() function “combines” or “concatenates” values in a vector of values. Both outputs of many_airports are the same, but as you can see the latter takes much less time to code.

                +

                As a final note we point out that filter() should often be among the first verbs you apply to your data. This cleans your dataset to only those rows you care about, or put differently, it narrows down the scope of your data frame to just the observations your care about.

                Learning check

                -

                (LC4.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above.

                +

                (LC4.1) What’s another way of using the “not” operator ! to filter only the rows that are not going to Burlington VT nor Seattle WA in the flights data frame? Test this out using the code above.

                @@ -675,7 +680,7 @@

                4.2 filter rows

                4.3 summarize variables

                -

                The next common task when working with data is to be able to summarize data: take a large number of values and summarize them with a single value. While this may seem like a very abstract idea, something as simple as the sum, the smallest value, and the largest values are all summaries of a large number of values.

                +

                The next common task when working with data is to return summary statistics: a single numerical value that summarizes a large number of values, for example the mean/average or the median. Other examples of summary statistics that might not immediately come to mind include the sum, the smallest value AKA the minimum, the largest value AKA the maximum, and the standard deviation; they are all summaries of a large number of values.

                Summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet

                @@ -688,58 +693,28 @@

                4.3 summarize variab FIGURE 4.3: Another summarize diagram from Data Wrangling with dplyr and tidyr cheatsheet

                -

                We can calculate the standard deviation and mean of the temperature variable temp in the weather data frame of nycflights13 in one step using the summarize (or equivalently using the UK spelling summarise) function in dplyr (See Appendix A):

                +

                Let’s calculate the mean and the standard deviation of the temperature variable temp in the weather data frame included in the nycflights13 package (See Appendix A). We’ll do this in one step using the summarize() function from the dplyr package and save the results in a new data frame summary_temp with columns/variables mean and the std_dev. Note you can also use the UK spelling of “summarise” using the summarise() function.

                +

                As shown in Figures 4.2 and 4.3, the weather data frame’s many rows will be collapsed into a single row of just the summary values, in this case the mean and standard deviation:

                summary_temp <- weather %>% 
                -  summarize(mean = mean(temp), 
                -            std_dev = sd(temp))
                +  summarize(mean = mean(temp), std_dev = sd(temp))
                 summary_temp
                -
                # A tibble: 1 x 2
                    mean std_dev
                   <dbl>   <dbl>
                 1    NA      NA
                -

                We’ve created a small data frame here called summary_temp that includes both the mean and the std_dev of the temp variable in weather. Notice as shown in Figures 4.2 and 4.3, the data frame weather went from many rows to a single row of just the summary values in the data frame summary_temp.

                -

                But why are the values returned NA? This stands for “not available or not applicable” and is how R encodes missing values; if in a data frame for a particular row and column no value exists, NA is stored instead. Furthermore, by default any time you try to summarize a number of values (using mean() and sd() for example) that has one or more missing values, then NA is returned.

                -

                Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values.

                -

                You can summarize all non-missing values by setting the na.rm argument to TRUE (rm is short for “remove”). This will remove any NA missing values and only return the summary value for all non-missing values. So the code below computes the mean and standard deviation of all non-missing values. Notice how the na.rm=TRUE are set as arguments to the mean() and sd() functions, and not to the summarize() function.

                +

                Why are the values returned NA? As we saw in Section 3.3.1 when creating the scatterplot of departure and arrival delays for alaska_flights, NA is how R encodes missing values where NA indicates “not available” or “not applicable.” If a value for a particular row and a particular column does not exist, NA is stored instead. Values can be missing for many reasons. Perhaps the data was collected but someone forgot to enter it? Perhaps the data was not collected at all because it was too difficult? Perhaps there was an erroneous value that someone entered that has been correct to read as missing? You’ll often encounter issues with missing values when working with real data.

                +

                Going back to our summary_temp output above, by default any time you try to calculate a summary statistic of a variable that has one or more NA missing values in R, then NA is returned. To work around this fact, you can set the na.rm argument to TRUE, where rm is short for “remove”; this will ignore any NA missing values and only return the summary value for all non-missing values.

                +

                The code below computes the mean and standard deviation of all non-missing values of temp. Notice how the na.rm=TRUE are used as arguments to the mean() and sd() functions individually, and not to the summarize() function.

                summary_temp <- weather %>% 
                   summarize(mean = mean(temp, na.rm = TRUE), 
                             std_dev = sd(temp, na.rm = TRUE))
                 summary_temp
                - - - - - - - - - - - - - -
                -mean - -std_dev -
                -55.3 - -17.8 -
                -

                It is not good practice to include a na.rm = TRUE in your summary commands by default; you should attempt to run code first without this argument as this will alert you to the presence of missing data. Only after you’ve identified where missing values occur and have thought about the potential causes of this missing should you consider using na.rm = TRUE. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values under the rug.

                - -

                What other summary functions can we use inside the summarize() verb? Any function in R that takes a vector of values and returns just one. Here are just a few:

                +
                # A tibble: 1 x 2
                +   mean std_dev
                +  <dbl>   <dbl>
                +1  55.3    17.8
                +

                However, one needs to be cautious whenever ignoring missing values as we’ve done above. In the upcoming Learning Checks we’ll consider the possible ramifications of blindly sweeping rows with missing values “under the rug.” This is in fact why the na.rm argument to any summary statistic function in R has is set to FALSE by default; in other words, do not ignore rows with missing values by default. R is alerting you to the presence of missing data and you should by mindful of this missingness and any potential causes of this missingness throughtout your analysis.

                +

                What are other summary statistic functions can we use inside the summarize() verb? As seen in Figure 4.3, you can use any function in R that takes many values and returns just one. Here are just a few:

                • mean(): the mean AKA the average
                • sd(): the standard deviation, which is a measure of spread
                • @@ -772,14 +747,7 @@

                  4.4 group_by rows

                -

                It’s often more useful to summarize a variable based on the groupings of another variable. Let’s say, we are interested in the mean and standard deviation of temperatures but grouped by month. To be more specific: we want the mean and standard deviation of temperatures

                -
                  -
                1. split by month.
                2. -
                3. sliced by month.
                4. -
                5. aggregated by month.
                6. -
                7. collapsed over month.
                8. -
                -

                Run the following code:

                +

                Say instead of the a single mean temperature for the whole year, you would like 12 mean temperatures, one for each of the 12 months separately? In other words, we would like to compute the mean temperature split by month AKA sliced by month AKA aggregated by month. We can do this by “grouping” temperature observations by the values of another variable, in this case by the 12 values of the variable month. Run the following code:

                summary_monthly_temp <- weather %>% 
                   group_by(month) %>% 
                   summarize(mean = mean(temp, na.rm = TRUE), 
                @@ -934,56 +902,88 @@ 

                4.4 group_by rows -

                This code is identical to the previous code that created summary_temp, with an extra group_by(month) added. Grouping the weather dataset by month and then passing this new data frame into summarize yields a data frame that shows the mean and standard deviation of temperature for each month in New York City. Note: Since each row in summary_monthly_temp represents a summary of different rows in weather, the observational units have changed.

                -

                It is important to note that group_by doesn’t change the data frame. It sets meta-data (data about the data), specifically the group structure of the data. It is only after we apply the summarize function that the data frame changes.

                -

                If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. For example, say the group structure meta-data is set to be by month via group_by(month), all future summarizations will be reported on a month-by-month basis. If however, we would like to no longer have this and have all summarizations be for all data in a single group (in this case over the entire year of 2013), then pipe the data frame in question through and ungroup() to remove this.

                -

                We now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to get a sense for how many flights departed each of the three airports in New York City:

                +

                This code is identical to the previous code that created summary_temp, but with an extra group_by(month) added before the summarize(). Grouping the weather dataset by month and then applying the summarize() functions yields a data frame that displays the mean and standard deviation temperature split by the 12 months of the year.

                +

                It is important to note that the group_by() function doesn’t change data frames by itself. Rather it changes the meta-data, or data about the data, specifically the group structure. It is only after we apply the summarize() function that the data frame changes. For example, let’s consider the diamonds data frame included in the ggplot2 package. Run this code, specifically in the console:

                +
                diamonds
                +
                # A tibble: 53,940 x 10
                +   carat cut       color clarity depth table price     x     y     z
                +   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
                + 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
                + 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
                + 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
                + 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
                + 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
                + 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
                + 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
                + 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
                + 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
                +10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
                +# … with 53,930 more rows
                +

                Observe that the first line of the output reads # A tibble: 53,940 x 10. This is an example of meta-data, in this case the number of observations/rows and variables/columns in diamonds. The actual data itself are the subsequent table of values.

                +

                Now let’s pipe the diamonds data frame into group_by(cut). Run this code, specifically in the console:

                +
                diamonds %>% 
                +  group_by(cut)
                +
                # A tibble: 53,940 x 10
                +# Groups:   cut [5]
                +   carat cut       color clarity depth table price     x     y     z
                +   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
                + 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
                + 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
                + 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
                + 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
                + 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
                + 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
                + 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
                + 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
                + 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
                +10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
                +# … with 53,930 more rows
                +

                Observe that now there is additional meta-data: # Groups: cut [5] indicating that the grouping structure meta-data has been set based on the 5 possible values AKA levels of the categorical variable cut: "Fair", "Good", "Very Good", "Premium", "Ideal". On the other hand observe that the data has not changed: it is still a table of 53,940 \(\times\) 10 values.

                +

                Only by combining a group_by() with another data wrangling operation, in this case summarize() will the actual data be transformed.

                +
                diamonds %>% 
                +  group_by(cut) %>% 
                +  summarize(avg_price = mean(price))
                +
                # A tibble: 5 x 2
                +  cut       avg_price
                +  <ord>         <dbl>
                +1 Fair          4359.
                +2 Good          3929.
                +3 Very Good     3982.
                +4 Premium       4584.
                +5 Ideal         3458.
                +

                If we would like to remove this group structure meta-data, we can pipe the resulting data frame into the ungroup() function. Observe how the # Groups: cut [5] meta-data is no longer present. Run this code, specifically in the console:

                +
                diamonds %>% 
                +  group_by(cut) %>% 
                +  ungroup()
                +
                # A tibble: 53,940 x 10
                +   carat cut       color clarity depth table price     x     y     z
                +   <dbl> <ord>     <ord> <ord>   <dbl> <dbl> <int> <dbl> <dbl> <dbl>
                + 1 0.23  Ideal     E     SI2      61.5    55   326  3.95  3.98  2.43
                + 2 0.21  Premium   E     SI1      59.8    61   326  3.89  3.84  2.31
                + 3 0.23  Good      E     VS1      56.9    65   327  4.05  4.07  2.31
                + 4 0.290 Premium   I     VS2      62.4    58   334  4.2   4.23  2.63
                + 5 0.31  Good      J     SI2      63.3    58   335  4.34  4.35  2.75
                + 6 0.24  Very Good J     VVS2     62.8    57   336  3.94  3.96  2.48
                + 7 0.24  Very Good I     VVS1     62.3    57   336  3.95  3.98  2.47
                + 8 0.26  Very Good H     SI1      61.9    55   337  4.07  4.11  2.53
                + 9 0.22  Fair      E     VS2      65.1    61   337  3.87  3.78  2.49
                +10 0.23  Very Good H     VS1      59.4    61   338  4     4.05  2.39
                +# … with 53,930 more rows
                +

                Let’s now revisit the n() counting summary function we introduced in the previous section. For example, suppose we’d like to count how many flights departed each of the three airports in New York City:

                by_origin <- flights %>% 
                   group_by(origin) %>% 
                   summarize(count = n())
                 by_origin
                - - - - - - - - - - - - - - - - - - - - - -
                -origin - -count -
                -EWR - -120835 -
                -JFK - -111279 -
                -LGA - -104662 -
                -

                We see that Newark ("EWR") had the most flights departing in 2013 followed by "JFK" and lastly by LaGuardia ("LGA"). Note there is a subtle but important difference between sum() and n(). While sum() simply adds up a large set of numbers, the latter counts the number of times each of many different values occur.

                +
                # A tibble: 3 x 2
                +  origin  count
                +  <chr>   <int>
                +1 EWR    120835
                +2 JFK    111279
                +3 LGA    104662
                +

                We see that Newark ("EWR") had the most flights departing in 2013 followed by "JFK" and lastly by LaGuardia ("LGA"). Note there is a subtle but important difference between sum() and n(); While sum() returns the sum of a numerical variable, n() returns counts of the the number of rows/observations.

                4.4.1 Grouping by more than one variable

                -

                You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month).

                +

                You are not limited to grouping by one variable! Say you wanted to know the number of flights leaving each of the three New York City airports for each month, we can also group by a second variable month: group_by(origin, month). We see there are 36 rows to by_origin_monthly because there are 12 months for 3 airports (EWR, JFK, and LGA).

                by_origin_monthly <- flights %>% 
                   group_by(origin, month) %>% 
                   summarize(count = n())
                @@ -1003,7 +1003,7 @@ 

                4.4.1 Grouping by more than one v 9 EWR 9 9550 10 EWR 10 10104 # … with 26 more rows

                -

                We see there are 36 rows to by_origin_monthly because there are 12 months times 3 airports (EWR, JFK, and LGA). Why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate:

                +

                Why do we group_by(origin, month) and not group_by(origin) and then group_by(month)? Let’s investigate:

                by_origin_monthly_incorrect <- flights %>% 
                   group_by(origin) %>% 
                   group_by(month) %>% 
                @@ -1024,18 +1024,7 @@ 

                4.4.1 Grouping by more than one v 10 10 28889 11 11 27268 12 12 28135

                -

                What happened here is that the second group_by(month) overrode the first group_by(origin), so that in the end we are only grouping by month. The lesson here, is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call.

                - +

                What happened here is that the second group_by(month) overrode the group structure meta-data of the first group_by(origin), so that in the end we are only grouping by month. The lesson here is if you want to group_by() two or more variables, you should include all these variables in a single group_by() function call.

                Learning check @@ -1060,7 +1049,38 @@

                4.5 mutate existing FIGURE 4.5: Mutate diagram from Data Wrangling with dplyr and tidyr cheatsheet

                -

                When looking at the flights dataset, there are some clear additional variables that could be calculated based on the values of variables already in the dataset. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to when they expected to land. This is commonly referred to as “gain” and we will create this variable using the mutate function. Note that we have also overwritten the flights data frame with what it was before as well as an additional variable gain here, or put differently, the mutate() command outputs a new data frame which then gets saved over the original flights data frame.

                +

                Another common transformation of data is to create/compute new variables based on existing ones. For example, say you are more comfortable thinking of temperature in degrees Celsius °C and not degrees Farenheit °F. The formula to convert temperatures from °F to °C is:

                +

                \[ +\text{temp in C} = \frac{\text{temp in F} - 32}{1.8} +\]

                +

                We can apply this formula to the temp variable using the mutate() function, which takes existing variables and mutates them to create new ones.

                +
                weather <- weather %>% 
                +  mutate(temp_in_C = (temp-32)/1.8)
                +View(weather)
                +

                Note that we have overwritten the original weather data frame with a new version that now includes the additional variable temp_in_C. In other words, the mutate() command outputs a new data frame which then gets saved over the original weather data frame. Furthermore, note how in mutate() we used temp_in_C = (temp-32)/1.8 to create a new variable temp_in_C.

                +

                Why did we overwrite the data frame weather instead of assigning the result to a new data frame like weather_new, but on the other hand why did we not overwrite temp, but instead created a new variable called temp_in_C? As a rough rule of thumb, as long as you are not losing original information that you might need later, it’s acceptable practice to overwrite existing data frames. On the other hand, had we used mutate(temp = (temp-32)/1.8) instead of mutate(temp_in_C = (temp-32)/1.8), we would have overwritten the original variable temp and lost its values.

                +

                Let’s compute average monthly temperatures in both °F and °C using the similar group_by() and summarize() code as in the previous section.

                +
                summary_monthly_temp <- weather %>% 
                +  group_by(month) %>% 
                +  summarize(mean_temp_in_F = mean(temp, na.rm = TRUE), 
                +            mean_temp_in_C = mean(temp_in_C, na.rm = TRUE))
                +summary_monthly_temp
                +
                # A tibble: 12 x 3
                +   month mean_temp_in_F mean_temp_in_C
                +   <dbl>          <dbl>          <dbl>
                + 1     1           35.6           2.02
                + 2     2           34.3           1.26
                + 3     3           39.9           4.38
                + 4     4           51.7          11.0 
                + 5     5           61.8          16.6 
                + 6     6           72.2          22.3 
                + 7     7           80.1          26.7 
                + 8     8           74.5          23.6 
                + 9     9           67.4          19.7 
                +10    10           60.1          15.6 
                +11    11           45.0           7.22
                +12    12           38.4           3.58
                +

                Let’s consider another example. Passengers are often frustrated when their flights depart late, but change their mood a bit if pilots can make up some time during the flight to get them to their destination close to the original arrival time. This is commonly referred to as “gain” and we will create this variable using the mutate() function.

                flights <- flights %>% 
                   mutate(gain = dep_delay - arr_delay)

                Let’s take a look at dep_delay, arr_delay, and the resulting gain variables for the first 5 rows in our new flights data frame:

                @@ -1073,7 +1093,6 @@

                4.5 mutate existing 4 -1 -18 17 5 -6 -25 19

                The flight in the first row departed 2 minutes late but arrived 11 minutes late, so its “gained time in the air” is actually a loss of 9 minutes, hence its gain is -9. Contrast this to the flight in the fourth row which departed a minute early (dep_delay of -1) but arrived 18 minutes early (arr_delay of -18), so its “gained time in the air” is 17 minutes, hence its gain is +17.

                -

                Why did we overwrite flights instead of assigning the resulting data frame to a new object, like flights_with_gain? As a rough rule of thumb, as long as you are not losing information that you might need later, it’s acceptable practice to overwrite data frames. However, if you overwrite existing variables and/or change the observational units, recovering the original information might prove difficult. In this case, it might make sense to create a new data object.

                Let’s look at summary measures of this gain variable and even plot it in the form of a histogram:

                gain_summary <- flights %>% 
                   summarize(
                @@ -1148,8 +1167,8 @@ 

                4.5 mutate existing

                We’ve recreated the summary function we saw in Chapter 3 here using the summarize function in dplyr.

                ggplot(data = flights, mapping = aes(x = gain)) +
                   geom_histogram(color = "white", bins = 20)
                -
                -Histogram of gain variable +
                +Histogram of gain variable

                FIGURE 4.6: Histogram of gain variable

                @@ -1176,8 +1195,8 @@

                4.5 mutate existing

                4.6 arrange and sort rows

                -

                One of the most common things people working with data would like to do is sort the data frames by a specific variable in a column. Have you ever been asked to calculate a median by hand? This requires you to put the data in order from smallest to highest in value. The dplyr package has a function called arrange that we will use to sort/reorder our data according to the values of the specified variable. This is often used after we have used the group_by and summarize functions as we will see.

                -

                Let’s suppose we were interested in determining the most frequent destination airports from New York City in 2013:

                +

                One of the most common tasks people working with data would like to perform is sort the data frame’s rows in alphanumeric order of the values in a variable/column. For example, when calculating a median by hand requires you to first sort the data from the smallest to highest in value and then identify the “middle” value. The dplyr package has a function called arrange() that we will use to sort/reorder a data frame’s rows according to the values of the specified variable. This is often used after we have used the group_by() and summarize() functions as we will see.

                +

                Let’s suppose we were interested in determining the most frequent destination airports for all domestic flights departing from New York City in 2013:

                freq_dest <- flights %>% 
                   group_by(dest) %>% 
                   summarize(num_flights = n())
                @@ -1196,7 +1215,7 @@ 

                4.6 arrange and sort 9 BGR 375 10 BHM 297 # … with 95 more rows

                -

                You’ll see that by default the values of dest are displayed in alphabetical order here. We are interested in finding those airports that appear most:

                +

                Observe that by default the rows of the resulting freq_dest data frame are sorted in alphabetical order of dest destination. Say instead we would like to see the same data, but sorted from the most to the least number of flights num_flights instead:

                freq_dest %>% 
                   arrange(num_flights)
                # A tibble: 105 x 2
                @@ -1213,7 +1232,7 @@ 

                4.6 arrange and sort 9 JAC 25 10 BZN 36 # … with 95 more rows

                -

                This is actually giving us the opposite of what we are looking for. It tells us the least frequent destination airports first. To switch the ordering to be descending instead of ascending we use the desc (descending) function:

                +

                This is actually giving us the opposite of what we are looking for: the rows are sorted with the least frequent destination airports displayed first. To switch the ordering to be descending instead of ascending we use the desc() function, which is short for “descending”:

                freq_dest %>% 
                   arrange(desc(num_flights))
                # A tibble: 105 x 2
                @@ -1230,40 +1249,40 @@ 

                4.6 arrange and sort 9 MIA 11728 10 DCA 9705 # … with 95 more rows

                +

                In other words, arrange() sorts in ascending order by default unless you override this default behavior by using desc().


                4.7 join data frames

                -

                Another common task is joining AKA merging two different datasets. For example, in the flights data, the variable carrier lists the carrier code for the different flights. While "UA" and "AA" might be somewhat easy to guess for some (United and American Airlines), what are “VX”, “HA”, and “B6”? This information is provided in a separate data frame airlines.

                +

                Another common data transformation task is “joining” or “merging” two different datasets. For example in the flights data frame the variable carrier lists the carrier code for the different flights. While the corresponding airline names for "UA" and "AA" might be somewhat easy to guess (United and American Airlines), what airlines have codes? "VX", "HA", and "B6"? This information is provided in a separate data frame airlines.

                View(airlines)
                -

                We see that in airports, carrier is the carrier code while name is the full name of the airline. Using this table, we can see that “VX”, “HA”, and “B6” correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, will we have to continually look up the carrier’s name for each flight in the airlines dataset? No! Instead of having to do this manually, we can have R automatically do the “looking up” for us.

                -

                Note that the values in the variable carrier in flights match the values in the variable carrier in airlines. In this case, we can use the variable carrier as a key variable to join/merge/match the two data frames by. Key variables are almost always identification variables that uniquely identify the observational units as we saw back in Subsection ?? on identification vs measurement variables. This ensures that rows in both data frames are appropriate matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables:

                +

                We see that in airports, carrier is the carrier code while name is the full name of the airline company. Using this table, we can see that "VX", "HA", and "B6" correspond to Virgin America, Hawaiian Airlines, and JetBlue respectively. However, wouldn’t it be nice to have all this information in a single data frame instead of two separate data frames? We can do this by “joining” i.e. “merging” the flights and airlines data frames.

                +

                Note that the values in the variable carrier in the flights data frame match the values in the variable carrier in the airlines data frame. In this case, we can use the variable carrier as a key variable to match the rows of the two data frames. Key variables are almost always identification variables that uniquely identify the observational units as we saw in Subsection ??. This ensures that rows in both data frames are appropriately matched during the join. Hadley and Garrett (Grolemund and Wickham 2016) created the following diagram to help us understand how the different datasets are linked by various key variables:

                Data relationships in nycflights13 from R for Data Science

                FIGURE 4.7: Data relationships in nycflights13 from R for Data Science

                -
                -

                4.7.1 Joining by “key” variables

                -

                In both flights and airlines, the key variable we want to join/merge/match the two data frames with has the same name in both datasets: carriers. We make use of the inner_join() function to join by the variable carrier.

                +
                +

                4.7.1 Matching “key” variable names

                +

                In both the flights and airlines data frames, the key variable we want to join/merge/match the rows of the two data frames by have the same name: carriers. We make use of the inner_join() function to join the two data frames, where the rows will be matched by the variable carrier.

                flights_joined <- flights %>% 
                   inner_join(airlines, by = "carrier")
                 View(flights)
                 View(flights_joined)
                -

                We observed that the flights and flights_joined are identical except that flights_joined has an additional variable name whose values were drawn from airlines.

                -

                A visual representation of the inner_join is given below (Grolemund and Wickham 2016):

                +

                Observe that the flights and flights_joined data frames are identical except that flights_joined has an additional variable name whose values correspond to the airline company names drawn from the airlines data frame.

                +

                A visual representation of the inner_join() is given below (Grolemund and Wickham 2016). There are other types of joins available (such as left_join(), right_join(), outer_join(), and anti_join()), but the inner_join() will solve nearly all of the problems you’ll encounter in this book.

                Diagram of inner join from R for Data Science

                FIGURE 4.8: Diagram of inner join from R for Data Science

                -

                There are more complex joins available, but the inner_join will solve nearly all of the problems you’ll face in our experience.

                -
                -

                4.7.2 Joining by “key” variables with different names

                -

                Say instead, you are interested in all the destinations of flights from NYC in 2013 and ask yourself:

                +
                +

                4.7.2 Different “key” variable names

                +

                Say instead you are interested in the destinations of all domestic flights departing NYC in 2013 and ask yourself:

                • “What cities are these airports in?”
                • “Is "ORD" Orlando?”
                • @@ -1271,14 +1290,15 @@

                  4.7.2 Joining by “key” variab

                The airports data frame contains airport codes:

                View(airports)
                -

                However, looking at both the airports and flights and the visual representation of the relations between the data frames in Figure 4.8, we see that in:

                +

                However, looking at both the airports and flights frames and the visual representation of the relations between these data frames in Figure 4.8 above, we see that in:

                  -
                • airports the airport code is in the variable faa
                • -
                • flights the airport code is in the variables origin and dest (destination)
                • +
                • the airports data frame the airport code is in the variable faa
                • +
                • the flights data frame the airport codes are in the variables origin and dest
                -

                So to join these two datasets so that we can identify the destination cities, our inner_join operation involves a by argument that accounts for the different names:

                -
                flights %>% 
                -  inner_join(airports, by = c("dest" = "faa"))
                +

                So to join these two data frames so that we can identify the destination cities for example, our inner_join() operation will use the by = c("dest" = "faa") argument, which allows us to join two data frames where the key variable has a different name:

                +
                flights_with_airport_names <-  flights %>% 
                +  inner_join(airports, by = c("dest" = "faa"))
                +View(flights_with_airport_names)

                Let’s construct the sequence of commands that computes the number of flights from NYC to each destination, but also includes information about each destination airport:

                named_dests <- flights %>%
                   group_by(dest) %>%
                @@ -1301,36 +1321,15 @@ 

                4.7.2 Joining by “key” variab 9 MIA 11728 Miami Intl 25.8 -80.3 8 -5 A America… 10 DCA 9705 Ronald Reagan Wash… 38.9 -77.0 15 -5 A America… # … with 91 more rows

                -

                In case you didn’t know, "ORD" is the airport code of Chicago O’Hare airport and "FLL" is the main airport in Fort Lauderdale, Florida, which we can now see in our named_dests data frame.

                +

                In case you didn’t know, "ORD" is the airport code of Chicago O’Hare airport and "FLL" is the main airport in Fort Lauderdale, Florida, which we can now see in the airport_name variable in the resulting named_dests data frame.

                -
                -

                4.7.3 Joining by multiple “key” variables

                +
                +

                4.7.3 Multiple “key” variables

                Say instead we are in a situation where we need to join by multiple variables. For example, in Figure 4.7 above we see that in order to join the flights and weather data frames, we need more than one key variable: year, month, day, hour, and origin. This is because the combination of these 5 variables act to uniquely identify each observational unit in the weather data frame: hourly weather recordings at each of the 3 NYC airports.

                -

                We achieve this by specifying a vector of key variables to join by using the c() concatenate function. Note the individual variables need to be wrapped in quotation marks.

                +

                We achieve this by specifying a vector of key variables to join by using the c() function for “combine” or “concatenate” that we saw earlier:

                flights_weather_joined <- flights %>%
                -  inner_join(weather, 
                -             by = c("year", "month", "day", "hour", "origin"))
                -flights_weather_joined
                -
                # A tibble: 335,220 x 32
                -    year month   day dep_time sched_dep_time dep_delay arr_time sched_arr_time
                -   <dbl> <dbl> <int>    <int>          <int>     <dbl>    <int>          <int>
                - 1  2013     1     1      517            515         2      830            819
                - 2  2013     1     1      533            529         4      850            830
                - 3  2013     1     1      542            540         2      923            850
                - 4  2013     1     1      544            545        -1     1004           1022
                - 5  2013     1     1      554            600        -6      812            837
                - 6  2013     1     1      554            558        -4      740            728
                - 7  2013     1     1      555            600        -5      913            854
                - 8  2013     1     1      557            600        -3      709            723
                - 9  2013     1     1      557            600        -3      838            846
                -10  2013     1     1      558            600        -2      753            745
                -# … with 335,210 more rows, and 24 more variables: arr_delay <dbl>,
                -#   carrier <chr>, flight <int>, tailnum <chr>, origin <chr>, dest <chr>,
                -#   air_time <dbl>, distance <dbl>, hour <dbl>, minute <dbl>,
                -#   time_hour.x <dttm>, gain <dbl>, hours <dbl>, gain_per_hour <dbl>,
                -#   temp <dbl>, dewp <dbl>, humid <dbl>, wind_dir <dbl>, wind_speed <dbl>,
                -#   wind_gust <dbl>, precip <dbl>, pressure <dbl>, visib <dbl>,
                -#   time_hour.y <dttm>
                + inner_join(weather, by = c("year", "month", "day", "hour", "origin")) +View(flights_weather_joined)

                Learning check @@ -1340,13 +1339,36 @@

                4.7.3 Joining by multiple “key

                (LC4.14) What surprises you about the top 10 destinations from NYC in 2013?

                +
                +

                +
                +

                4.7.4 Normal forms

                +

                The data frames included in the nycflights13 package are in a form that minimizes redundancy of data. For example, the flights data frame only saves the carrier code of the airline company; it does not include the actual name of the airline. For example the first row of flights has carrier equal to UA, but does it does not include the airline name “United Air Lines Inc.” The names of the airline companies are included in the name variable of the airlines data frame. In order to have the airline company name included in flights, we could join these two data frames as follows:

                +
                joined_flights <- flights %>% 
                +  inner_join(airlines, by = "carrier")
                +View(joined_flights)
                +

                We are capable of performing this join because each of the data frames have keys in common to relate one to another: the carrier variable in both the flights and airlines data frames. The key variable(s) that we join are often identification variables we mentioned previously.

                +

                This is an important property of what’s known as normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia.

                +
                +

                +Learning check +

                +
                +

                (LC4.15) What are some advantages of data in normal forms? What are some disadvantages?

                +
                +

                4.8 Other verbs

                -

                On top of the following examples of other verbs, if you’d like to see more examples on using dplyr, the data wrangling verbs we introduction in Section ??, and the pipe function %>% with the nycflights13 dataset, check out Chapter 5 of Hadley and Garrett’s book (Grolemund and Wickham 2016).

                +

                Here are some other useful data wrangling verbs that might come in handy:

                +
                  +
                • select() only a subset of variables/columns
                • +
                • rename() variables/columns to have new names
                • +
                • Return only the top_n() values of a variable
                • +

                4.8.1 select variables

                @@ -1355,25 +1377,25 @@

                4.8.1 select variabl FIGURE 4.9: Select diagram from Data Wrangling with dplyr and tidyr cheatsheet

                -

                We’ve seen that the flights data frame in the nycflights13 package contains many different variables. The names function gives a listing of all the columns in a data frame; in our case you would run names(flights). You can also identify these variables by running the glimpse function in the dplyr package:

                +

                We’ve seen that the flights data frame in the nycflights13 package contains 19 different variables. You can identify the names of these 19 variables by running the glimpse() function from the dplyr package:

                glimpse(flights)
                -

                However, say you only want to consider two of these variables, say carrier and flight. You can select these:

                +

                However, say you only need two of these variables, say carrier and flight. You can select() these two variables:

                flights %>% 
                   select(carrier, flight)
                -

                This function makes navigating datasets with a very large number of variables easier for humans by restricting consideration to only those of interest, like carrier and flight above. So for example, this might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned it doesn’t care how many additional variables are in the dataset in question, so long as carrier and flight are included.

                -

                Another example involves the variable year. If you remember the original description of the flights data frame (or by running ?flights), you’ll remember that this data correspond to flights in 2013 departing New York City. The year variable isn’t really a variable here in that it doesn’t vary… flights actually comes from a larger dataset that covers many years. We may want to remove the year variable from our dataset since it won’t be helpful for analysis in this case. We can deselect year by using the - sign:

                +

                This function makes exploring data frames with a very large number of variables easier for humans to process by restricting consideration to only those we care about, like our example with carrier and flight above. This might make viewing the dataset using the View() spreadsheet viewer more digestible. However, as far as the computer is concerned, it doesn’t care how many additional variables are in the data frame in question, so long as carrier and flight are included.

                +

                Let’s say instead you want to drop i.e deselect certain variables. For example, take the variable year in the flights data frame. This variable isn’t quite a “variable” in the sense that all the values are 2013 i.e. it doesn’t change. Say you want to remove the year variable from the data frame; we can deselect year by using the - sign:

                flights_no_year <- flights %>% 
                   select(-year)
                -names(flights_no_year)
                -

                Or we could specify a ranges of columns:

                +glimpse(flights_no_year)
                +

                Another way of selecting columns/variables is by specifying a range of columns:

                flight_arr_times <- flights %>% 
                   select(month:day, arr_time:sched_arr_time)
                 flight_arr_times
                -

                The select function can also be used to reorder columns in combination with the everything helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to actually appear immediately after the day variable:

                +

                The select() function can also be used to reorder columns in combination with the everything() helper function. Let’s suppose we’d like the hour, minute, and time_hour variables, which appear at the end of the flights dataset, to appear immediately after the year, month, and day variables while keeping the rest of the variables. In the code below everything() picks up all remaining variables.

                flights_reorder <- flights %>% 
                -  select(month:day, hour:time_hour, everything())
                -names(flights_reorder)
                -

                in this case everything() picks up all remaining variables. Lastly, the helper functions starts_with, ends_with, and contains can be used to choose column names that match those conditions:

                + select(year, month, day, hour, minute, time_hour, everything()) +glimpse(flights_reorder)
                +

                Lastly, the helper functions starts_with(), ends_with(), and contains() can be used to select variables/column that match those conditions. For example:

                flights_begin_a <- flights %>% 
                   select(starts_with("a"))
                 flights_begin_a
                @@ -1386,42 +1408,32 @@

                4.8.1 select variabl

        • 4.8.2 rename variables

          -

          Another useful function is rename, which as you may suspect renames one column to another name. Suppose we wanted dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame:

          +

          Another useful function is rename(), which as you may have guessed renames one column to another name. Suppose we want dep_time and arr_time to be departure_time and arrival_time instead in the flights_time data frame:

          flights_time_new <- flights %>% 
             select(contains("time")) %>% 
             rename(departure_time = dep_time,
                    arrival_time = arr_time)
          -names(flights_time)
          -

          Note that in this case we used a single = sign with the rename(). Ex: departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time.

          -

          It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA. You’ll receive an error if you try to do it the other way:

          -
          Error: Unknown variables: departure_time, arrival_time.
          +glimpse(flights_time)
          +

          Note that in this case we used a single = sign within the rename(), for example departure_time = dep_time. This is because we are not testing for equality like we would using ==, but instead we want to assign a new variable departure_time to have the same values as dep_time and then delete the variable dep_time. It’s easy to forget if the new name comes before or after the equals sign. I usually remember this as “New Before, Old After” or NBOA.

          4.8.3 top_n values of a variable

          -

          We can also use the top_n function which automatically tells us the most frequent num_flights. We specify the top 10 airports here:

          +

          We can also return the top n values of a variable using the top_n() function. For example, we can return a data frame of the top 10 destination airports using the example from Section 4.7.2. Observe that we set the number of values to return to n = 10 and wt = num_flights to indicate that we want the rows of corresponding to the top 10 values of num_flights. See the help file for top_n() by running ?top_n for more information.

          named_dests %>% 
             top_n(n = 10, wt = num_flights)
          -

          We’ll still need to arrange this by num_flights though:

          +

          Let’s further arrange() these results in descending order of num_flights:

          named_dests  %>% 
             top_n(n = 10, wt = num_flights) %>% 
             arrange(desc(num_flights))
          -

          Note: Remember that I didn’t pull the n and wt arguments out of thin air. They can be found by using the ? function on top_n.

          -

          We can go one stop further and tie together the group_by and summarize functions we used to find the most frequent flights:

          -
          ten_freq_dests <- flights %>%
          -  group_by(dest) %>%
          -  summarize(num_flights = n()) %>%
          -  arrange(desc(num_flights)) %>%
          -  top_n(n = 10) 
          -View(ten_freq_dests)

          Learning check

          -

          (LC4.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways.

          -

          (LC4.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains.

          -

          (LC4.17) Why might we want to use the select function on a data frame?

          -

          (LC4.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

          +

          (LC4.16) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways.

          +

          (LC4.17) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains.

          +

          (LC4.18) Why might we want to use the select function on a data frame?

          +

          (LC4.19) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

          @@ -1432,15 +1444,13 @@

          4.8.3 top_n values o

          4.9 Conclusion

          4.9.1 Summary table

          -

          Let’s recap a selection of verbs in Table 4.1 summarizing their differences. Using these verbs and the pipe %>% operator from Section 4.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.

          +

          Let’s recap our data wrangling verbs in Table 4.1. Using these verbs and the pipe %>% operator from Section 4.1, you’ll be able to write easily legible code to perform almost all the data wrangling necessary for the rest of this book.

          - @@ -1451,9 +1461,6 @@

          4.9.1 Summary table

          - @@ -1462,9 +1469,6 @@

          4.9.1 Summary table

          - @@ -1473,9 +1477,6 @@

          4.9.1 Summary table

          - @@ -1484,9 +1485,6 @@

          4.9.1 Summary table

          - @@ -1495,9 +1493,6 @@

          4.9.1 Summary table

          - @@ -1506,9 +1501,6 @@

          4.9.1 Summary table

          - @@ -1523,7 +1515,7 @@

          4.9.1 Summary table

          Learning check

          -

          (LC4.19) Let’s now put your newly acquired data wrangling skills to the test!

          +

          (LC4.20) Let’s now put your newly acquired data wrangling skills to the test!

          An airline industry measure of a passenger airline’s capacity is the available seat miles, which is equal to the number of seats available multiplied by the number of miles or kilometers flown summed over all flights. So for example say an airline had 2 flights using a plane with 10 seats that flew 500 miles and 3 flights using a plane with 20 seats that flew 1000 miles, the available seat miles would be 2 \(\times\) 10 \(\times\) 500 \(+\) 3 \(\times\) 20 \(\times\) 1000 = 70,000 seat miles.

          Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:

            @@ -1547,6 +1539,7 @@

            4.9.2 Additional resources

            FIGURE 4.10: Data Transformation with dplyr cheatsheat

            +

            On top of data wrangling verbs and examples we presented in this section, if you’d like to see more examples of using the dplyr package for data wrangling check out Chapter 5 of Garrett Grolemund and Hadley Wickham’s and Garrett’s book (Grolemund and Wickham 2016).

            - - - - diff --git a/docs/5-tidy.html b/docs/5-tidy.html index 519007dc5..0909ad5c6 100644 --- a/docs/5-tidy.html +++ b/docs/5-tidy.html @@ -6,20 +6,20 @@ Chapter 5 Data Importing & “Tidy” Data | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
          1. 4.5 mutate existing variables
          2. 4.6 arrange and sort rows
          3. 4.7 join data frames
          4. 4.8 Other verbs
            • 4.8.1 select variables
            • @@ -232,26 +233,24 @@
            • 5 Data Importing & “Tidy” Data
            • B Inference Examples
              • Needed packages
              • @@ -538,9 +531,9 @@

                Chapter 5 Data Importing & “Tidy” Data

                -

                In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started exploring our first data frame: the flights data frame included in the nycflights13 package. In Chapter 3 we created visualizations based on the data included in flights and other data frames such as weather. In Chapter 4, we learned how to wrangle data, in other words take existing data frames and transform and modify them to suit our desired analysis.

                -

                In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest of having your data “neatly organized” in a spreadsheet. Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored and the implications of these rules for analyses.

                -

                Although knowledge of this type of data formatting was not necessary in our treatment of data visualization in Chapter 3 since all the data was already in tidy format, we’ll see going forward that having tidy data will allow you to more easily create data visualizations in a wide range of settings. Furthermore, it will also help you with data wrangling in Chapter 4 and in all subsequent chapters in this book when we cover regression and discuss statistical inference.

                +

                In Subsection 2.2.1 we introduced the concept of a data frame: a rectangular spreadsheet-like representation of data in R where the rows correspond to observations and the columns correspond to variables describing each observation. In Section 2.4, we started exploring our first data frame: the flights data frame included in the nycflights13 package. In Chapter 3 we created visualizations based on the data included in flights and other data frames such as weather. In Chapter 4, we learned how to wrangle data, in other words take existing data frames and transform/ modify them to suit our analysis goals.

                +

                In this final chapter of the “Data Science via the tidyverse” portion of the book, we extend some of these ideas by discussing a type of data formatting called “tidy” data. You will see that having data stored in “tidy” format is about more than what the colloquial definition of the term “tidy” might suggest: having your data “neatly organized.” Instead, we define the term “tidy” in a more rigorous fashion, outlining a set of rules by which data can be stored, and the implications of these rules for analyses.

                +

                Although knowledge of this type of data formatting was not necessary for our treatment of data visualization in Chapter 3 and data wrangling in Chapter 4 since all the data was already in “tidy” format, we’ll now see this format is actually essential to using the tools we covered in these two chapters. Furthermore, it will also be useful for all subsequent chapters in this book when we cover regression and statistical inference. First however, we’ll show you how to import spreadsheet data for use in R.

                Needed packages

                Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages.

                @@ -554,21 +547,21 @@

                Needed packages

                5.1 Importing data

                -

                Up to this point, we’ve almost entirely used data stored inside of an R package. Another common way to getting data into R is by importing from a spreadsheet file either on your computer or online. Spreadsheet data is often saved in one of two formats:

                +

                Up to this point, we’ve almost entirely used data stored inside of an R package. Say instead you have your own data saved on your computer or somewhere online? How can you analyze this data in R? Spreadsheet data is often saved in one of the following formats:

                  -
                • A Comma Separated Values .csv file. You can think of a CSV file as a bare-bones spreadsheet where: +
                • A Comma Separated Values .csv file. You can think of a .csv file as a bare-bones spreadsheet where:
                  • Each line in the file corresponds to one row of data/one observation.
                  • Values for each line are separated with commas. In other words, the values of different variables are separated by commas.
                  • The first line is often, but not always, a header row indicating the names of the columns/variables.
                • -
                • An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of metadata, or put more simply, data about the data. Examples include the use of bold and italic fonts, colored cells, different column widths, and formula macros etc.
                • +
                • An Excel .xlsx file. This format is based on Microsoft’s proprietary Excel software. As opposed to a bare-bones .csv files, .xlsx Excel files contain a lot of meta-data, or put more simply, data about the data. (Recall we saw a previous example of meta-data in Section 4.4 when adding “group structure” meta-data to a data frame by using the group_by() verb.) Some examples of spreadsheet meta-data include the use of bold and italic fonts, colored cells, different column widths, and formula macros.
                • +
                • A Google Sheets file, which is a “cloud” or online-based way to work with a spreadsheet. Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats however: go to the Google Sheets menu bar -> File -> Download as -> Select “Microsoft Excel” or “Comma-separated values.”
                -

                Google Sheets allows you to download your data in both comma separated values .csv and Excel .xlsx formats: Go to the Google Sheets menu bar -> File -> Download as -> Select “Microsoft Excel” or “Comma-separated values.”

                -

                We’ll cover two methods for importing data in R: one using the R console and the other using RStudio’s graphical interface.

                -
                -

                5.1.1 Importing via the console

                -

                First, let’s import a Comma Separated Values (CSV) of data directly off the internet. The CSV file dem_score.csv accessible at https://moderndive.com/data/dem_score.csv contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the read_csv() function from the readr package to read it off the web, import it into R, and save the data in a data frame called dem_score

                +

                We’ll cover two methods for importing .csv and .xlsx spreadsheet data in R: one using the R console and the other using RStudio’s graphical user interface, abbreviated a GUI.

                +
                +

                5.1.1 Using the console

                +

                First, let’s import a Comma Separated Values .csv file of data directly off the internet. The .csv file dem_score.csv accessible at https://moderndive.com/data/dem_score.csv contains ratings of the level of democracy in different countries spanning 1952 to 1992. Let’s use the read_csv() function from the readr package to read it off the web, import it into R, and save it in a data frame called dem_score

                library(readr)
                 dem_score <- read_csv("https://moderndive.com/data/dem_score.csv")
                 dem_score
                @@ -586,17 +579,17 @@

                5.1.1 Importing via the console -

                In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. We’ll revisit the dem_score data frame in a case study analysis in the upcoming Section 5.3.

                -

                Note that the read_csv() function included in the readr package is different than the read.csv() function that comes with R even if you don’t install any packages. While the different in the names might be near meaningless (an _ instead of a .), the read_csv() is in our opinions easier to use since it can easily read data off the web and generally imports data at a much faster speed.

                +

                In this dem_score data frame, the minimum value of -10 corresponds to a highly autocratic nation whereas a value of 10 corresponds to a highly democratic nation. We’ll revisit the dem_score data frame in a case study in the upcoming Section 5.3.

                +

                Note that the read_csv() function included in the readr package is different than the read.csv() function that comes installed with R by default. While the difference in the names might seem near meaningless (an _ instead of a .), the read_csv() function is in our opinion easier to use since it can more easily read data off the web and generally imports data at a much faster speed.

                -
                -

                5.1.2 Importing via RStudio’s interface

                +
                +

                5.1.2 Using RStudio’s interface

                Let’s read in the exact same data saved in Excel format, but this time via RStudio’s graphical interface instead of via the R console. First download the Excel file dem_score.xlsx by clicking here, then

                1. Go to the Files panel of RStudio.
                2. -
                3. Navigate to the directory where your downloaded dem_score.xlsx is saved.
                4. -
                5. Click on dem_score.xlsx
                6. +
                7. Navigate to the directory i.e. folder on your computer where the downloaded dem_score.xlsx Excel file is saved.
                8. +
                9. Click on dem_score.xlsx.
                10. Click “Import Dataset…”

                At this point you should see an image like this:

                @@ -604,13 +597,13 @@

                5.1.2 Importing via RStudio’s i

                -

                After clicking on the “Import” button on the bottom right RStudio save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore on the bottom right you’ll see the code that read in your data in the console; you can copy and paste this code to reload your data again later automatically instead of repeating the above manual process.

                +

                After clicking on the “Import” button on the bottom right RStudio, RStudio will save this spreadsheet’s data in a data frame called dem_score and display its contents in the spreadsheet viewer. Furthermore, note in the bottom right of the above image there exists a “Code Preview”: you can copy and paste this code to reload your data again later automatically instead of repeating the above manual point-and-click process.


                -
                +

                5.2 Tidy data

                -

                Let’s now switch gears and learn about the concept of “tidy” data format. Let’s start with a motivating example. Let’s consider the drinks data frame included in the fivethirtyeight data. Run the

                +

                Let’s now switch gears and learn about the concept of “tidy” data format by starting with a motivating example. Let’s consider the drinks data frame included in the fivethirtyeight data. Run the following:

                drinks
                # A tibble: 193 x 5
                    country      beer_servings spirit_servings wine_servings total_litres_of_pur…
                @@ -626,8 +619,14 @@ 

                5.2 Tidy data

                9 Australia 261 72 212 10.4 10 Austria 279 75 191 9.7 # … with 183 more rows
                -

                After reading the help file by running ?drinks we see that is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries originally reported on the data journalism website FiveThirtyEight.com’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”.

                -

                Let’s filter drinks to only consider 4 countries: the US, China, Italy, and Saudi Arabia; drop the column total_litres_of_pure_alcohol by using select() with a - sign; and rename the variables beer_servings, spirit_servings, and wine_servings to read beer, spirit, and wine.

                +

                After reading the help file by running ?drinks, we see that drinks is a data frame containing results from a survey of the average number of servings of beer, spirits, and wine consumed for 193 countries. This data was originally reported on the data journalism website FiveThirtyEight.com in Mona Chalabi’s article “Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?”

                +

                Let’s apply some of the data wrangling verbs we learned in Chapter 4 on the drinks data frame. Let’s

                +
                  +
                1. filter() the drinks data frame to only consider 4 countries (the United States, China, Italy, and Saudi Arabia) then
                2. +
                3. select() all columns except total_litres_of_pure_alcohol by using - sign, then
                4. +
                5. rename() the variables beer_servings, spirit_servings, and wine_servings to beer, spirit, and wine respectively
                6. +
                +

                and save the resulting data frame in drinks_smaller.

                drinks_smaller <- drinks %>% 
                   filter(country %in% c("USA", "China", "Italy", "Saudi Arabia")) %>% 
                   select(-total_litres_of_pure_alcohol) %>% 
                @@ -640,7 +639,7 @@ 

                5.2 Tidy data

                2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84
                -

                Using drinks_smaller, how would we create the side-by-side AKA dodged barplot in Figure 5.1; recall we saw barplots displaying two categorical variables in Section 3.8.3.

                +

                Using the drinks_smaller data frame, how would we create the side-by-side AKA dodged barplot in Figure 5.1? Recall we saw barplots displaying two categorical variables in Section 3.8.3.

                Alcohol consumption in 4 countries.

                @@ -649,11 +648,11 @@

                5.2 Tidy data

                Let’s break down the Grammar of Graphics:

                  -
                1. The categorical variable country with four levels (China, Italy, Saudi Arabia, USA) is mapped to the x-position of the bars.
                2. -
                3. The numerical variable servings is mapped to the y-position of the bars, in other words the height.
                4. -
                5. The cateogircal variable type with three levels (beer, spirit, wine) is mapped to the fill color of the bars.
                6. +
                7. The categorical variable country with four levels (China, Italy, Saudi Arabia, USA) would have to be mapped to the x-position of the bars.
                8. +
                9. The numerical variable servings would have to be mapped to the y-position of the bars, in other words the height of the bars.
                10. +
                11. The categorical variable type with three levels (beer, spirit, wine) who have to be mapped to the fill color of the bars.
                -

                Observe however that drinks_smaller has three separate columns for beer, spirit, and wine, whereas in order to recreate the side-by-side AKA dodged barplot in Figure 5.1 we would need a single column type with three possible values: beer, spirit, and wine. In other words, for us to be able to create this barplot, our data frame would have to look like:

                +

                Observe however that drinks_smaller has three separate variables for beer, spirit, and wine, whereas in order to recreate the side-by-side AKA dodged barplot in Figure 5.1 we would need a single variable type with three possible values: beer, spirit, and wine, which we would then map to the fill aesthetic. In other words, for us to be able to create the barplot in Figure 5.1, our data frame would have to look like this:

                drinks_smaller_tidy
                # A tibble: 12 x 3
                    country      type   servings
                @@ -670,9 +669,19 @@ 

                5.2 Tidy data

                10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84
                -

                Observe that while drinks_smaller and drinks_smaller_tidy are both rectangular in shape and contain the same data on 4 countries average number of servings for 3 alcohol types, totalling 12 numerical values, they are formatted differently. drinks_smaller is formatted in what’s known as “wide” format, whereas drinks_smaller_tidy is formated in what’s known as “long/narrow”. “Long/narrow” format is as known in R circles as “tidy” format.

                -
                -

                5.2.1 What is tidy data?

                +

                Let’s compare the drinks_smaller_tidy with the drinks_smaller data frame from earlier:

                +
                drinks_smaller
                +
                # A tibble: 4 x 4
                +  country       beer spirit  wine
                +  <chr>        <int>  <int> <int>
                +1 China           79    192     8
                +2 Italy           85     42   237
                +3 Saudi Arabia     0      5     0
                +4 USA            249    158    84
                +

                Observe that while drinks_smaller and drinks_smaller_tidy are both rectangular in shape and contain the same 12 numerical values (3 alcohol types \(\times\) 4 countries), they are formatted differently. drinks_smaller is formatted in what’s known as “wide” format, whereas drinks_smaller_tidy is formatted in what’s known as “long/narrow”. In the context of using R, long/narrow format is also known as “tidy” format. Furthermore, in order to use the ggplot2 and dplyr packages for data visualization and data wrangling, your input data frames must be in “tidy” format. So all non-“tidy” data must be converted to “tidy” format first.

                +

                Before we show you how to convert non-“tidy” data frames like drinks_smaller to “tidy” data frames like drinks_smaller_tidy, let’s go over the explicit definition of “tidy” data.

                +
                +

                5.2.1 Definition of “tidy” data

                You have surely heard the word “tidy” in your life:

                -

                What does it mean for your data to be “tidy”? While “tidy” has a clear english meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014):

                +

                What does it mean for your data to be “tidy”? While “tidy” has a clear English meaning of “organized”, “tidy” in the context of data science using R means that your data follows a standardized format. We will follow Hadley Wickham’s definition of tidy data here (Wickham 2014):

                A dataset is a collection of values, usually either numbers (if quantitative) or strings AKA text data (if qualitative). Values are organised in two ways. Every value belongs to a variable and an observation. A variable contains all values that measure the same underlying attribute (like height, temperature, duration) across units. An observation contains all values measured on the same unit (like a person, or a day, or a city) across attributes.

                @@ -695,15 +704,15 @@

                5.2.1 What is tidy data?

          -Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html +Tidy data graphic from [R for Data Science](http://r4ds.had.co.nz/tidy-data.html).

          -FIGURE 5.2: Tidy data graphic from http://r4ds.had.co.nz/tidy-data.html +FIGURE 5.2: Tidy data graphic from R for Data Science.

          -

          For example, say the following table consists of stock prices:

          +

          For example, say you have the following table of stock prices in Table 5.1:

          TABLE 4.1: Summary of data wrangling verbs
          - Verb
          -1 - filter()
          -2 - summarize()
          -3 - group_by()
          -4 - mutate()
          -5 - arrange()
          -6 - inner_join()
          @@ -752,10 +761,10 @@

          5.2.1 What is tidy data?

          -TABLE 5.1: Stock Prices (Non-Tidy Format) +TABLE 5.1: Stock Prices (Non-Tidy Format)
          -

          Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format since there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), but there are not three columns. In tidy data format each variable should be its own column, as shown below. Notice that both tables present the same information, but in different formats.

          +

          Although the data are neatly organized in a rectangular spreadsheet-type format, they are not in tidy format because while there are three variables corresponding to three unique pieces of information (Date, Stock Name, and Stock Price), there are not three columns. In “tidy” data format each variable should be its own column, as shown in Table 5.2. Notice that both tables present the same information, but in different formats.

          @@ -839,10 +848,10 @@

          5.2.1 What is tidy data?

          -TABLE 5.2: Stock Prices (Tidy Format) +TABLE 5.2: Stock Prices (Tidy Format)
          -

          However, consider the following table

          +

          Now we have the requisite three columns Date, Stock Name, and Stock Price. On the other hand, consider the data in Table 5.3.

          @@ -882,11 +891,22 @@

          5.2.1 What is tidy data?

          -TABLE 5.3: Date, Boeing Price, Weather Data +TABLE 5.3: Date, Boeing Price, Weather Data
          -

          In this case, even though the variable “Boeing Price” occurs again, the data is tidy since there are three variables corresponding to three unique pieces of information (Date, Boeing stock price, and the weather that particular day).

          +

          In this case, even though the variable “Boeing Price” occurs just like in our non-“tidy” data in Table 5.1, the data is “tidy” since there are three variables corresponding to three unique pieces of information: Date, Boeing stock price, and the weather that particular day.

          +
          +

          +Learning check +

          +
          +

          (LC5.1) What are common characteristics of “tidy” data frames?

          +

          (LC5.2) What makes “tidy” data frames useful for organizing data?

          +
          + +
          -
          -

          5.2.2 Converting to “tidy” format

          -

          In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll only see data frames that are already in “tidy” format. This is not always the case however with data in the wild. If your original data is in wide AKA non-“tidy” format and you would like to use the ggplot2 or dplyr packages on it, you will have to convert it “tidy” format using the gather() function in the tidyr package (Wickham and Henry 2018). Going back to our drinks_smaller data frame

          +
          +

          5.2.2 Converting to “tidy” data

          +

          In this book so far, you’ve only seen data frames that were already in “tidy” format. Furthermore for the rest of this book, you’ll mostly only see data frames that are already in “tidy” format as well. This is not always the case however with data in the wild. If your original data frame is in wide i.e. non-“tidy” format and you would like to use the ggplot2 package for data visualization or the dplyr package for data wrangling, you will first have to convert it “tidy” format using the gather() function in the tidyr package (Wickham and Henry 2018).

          +

          Going back to our drinks_smaller data frame from earlier:

          drinks_smaller
          # A tibble: 4 x 4
             country       beer spirit  wine
          @@ -895,7 +915,7 @@ 

          5.2.2 Converting to “tidy” fo 2 Italy 85 42 237 3 Saudi Arabia 0 5 0 4 USA 249 158 84

          -

          let’s convert it to “tidy” format by using the gather() function from the tidyr package:

          +

          We convert it to “tidy” format by using the gather() function from the tidyr package as follows:

          drinks_smaller_tidy <- drinks_smaller %>% 
             gather(key = type, value = servings, -country)
           drinks_smaller_tidy
          @@ -914,30 +934,48 @@

          5.2.2 Converting to “tidy” fo 10 Italy wine 237 11 Saudi Arabia wine 0 12 USA wine 84 -

          We set the

          +

          We set the arguments to gather() as follows:

            -
          1. key argument to be the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to gather. Observe we set key = type and in the resulting drinks_smaller_tidy data frame, the column type contains the names beer, spirit, and serving.
          2. -
          3. value argument to be the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to gather. Observe we set value = servings and in the resulting drinks_smaller_tidy data frame, the column value contains the 4 \(\times\) 3 numerical values.
          4. -
          5. Third argument to be the columns you want to or don’t want to gather. Observe we set this to -country indicating that we don’t want to gather the country variable and in the resulting drinks_smaller_tidy data frame there is still a variable country.
          6. +
          7. key is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set key = type and in the resulting drinks_smaller_tidy the column type contains the three types of alcohol beer, spirit, and wine.
          8. +
          9. value is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set value = servings and in the resulting drinks_smaller_tidy the column value contains the 4 \(\times\) 3 = 12 numerical values.
          10. +
          11. The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to -country indicating that we don’t want to tidy the country variable in drinks_smaller and rather only beer, spirit, and wine.
          -

          With the resulting drinks_smaller_tidy “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using geom_col() and not geom_bar(), since we would like to map the servings variable to the y-aesthetic of the bars.

          +

          The third argument is a little nuanced, so let’s consider another example. Note the code below is very similar, but now the third argument species which columns we’d want to tidy c(beer, spirit, wine), instead of the columns we don’t want to tidy -country. Note the use of c() to create a vector of the columns in drinks_smaller that we’d like to tidy. If you run the code below, you’ll see that the resulting drinks_smaller_tidy is the same.

          +
          drinks_smaller_tidy <- drinks_smaller %>% 
          +  gather(key = type, value = servings, c(beer, spirit, wine))
          +drinks_smaller_tidy
          +

          With our drinks_smaller_tidy “tidy” format data frame, we can now produce a side-by-side AKA dodged barplot using geom_col() and not geom_bar(), since we would like to map the servings variable to the y-aesthetic of the bars.

          ggplot(drinks_smaller_tidy, aes(x=country, y=servings, fill=type)) +
             geom_col(position = "dodge")
          -

          +

          Converting “wide” format data to “tidy” format often confuses new R users. The only way to learn to get comfortable with the gather() function is with practice, practice, and more practice. For example, see the examples in the bottom of the help file for gather() by running ?gather. We’ll show another example of using gather() to convert a “wide” formatted data frame to “tidy” format in Section 5.3. For other examples of converting a dataset into “tidy” format, check out the different functions available for data tidying and a case study using data from the World Health Organization in R for Data Science (Grolemund and Wickham 2016).

          Learning check

          -

          (LC5.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?

          -
          # A tibble: 3 x 4
          -  country     beer_servings spirit_servings wine_servings
          -  <chr>               <int>           <int>         <int>
          -1 Canada                240             122           100
          -2 South Korea           140              16             9
          -3 USA                   249             158            84
          -

          This data frame is not in tidy format. What would it look like if it were?

          +

          (LC5.3) Take a look the airline_safety data frame included in the fivethirtyeight data. Run the following:

          +
          airline_safety
          +

          After reading the help file by running ?airline_safety, we see that airline_safety is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”. Let’s ignore the incl_reg_subsidiaries and avail_seat_km_per_week variables for simplicity:

          +
          airline_safety_smaller <- airline_safety %>% 
          +  select(-c(incl_reg_subsidiaries, avail_seat_km_per_week))
          +airline_safety_smaller
          +
          # A tibble: 56 x 7
          +   airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14
          +   <chr>             <int>            <int>            <int>           <int>
          + 1 Aer Li…               2                0                0               0
          + 2 Aerofl…              76               14              128               6
          + 3 Aeroli…               6                0                0               1
          + 4 Aerome…               3                1               64               5
          + 5 Air Ca…               2                0                0               2
          + 6 Air Fr…              14                4               79               6
          + 7 Air In…               2                1              329               4
          + 8 Air Ne…               3                0                0               5
          + 9 Alaska…               5                0                0               5
          +10 Alital…               7                2               50               4
          +# … with 46 more rows, and 2 more variables: fatal_accidents_00_14 <int>,
          +#   fatalities_00_14 <int>
          +

          This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable incident_type_years indicating the incident type/year and a variable count of the counts?

          @@ -945,7 +983,7 @@

          5.2.2 Converting to “tidy” fo

          5.2.3 nycflights13 package

          -

          Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 to create visualizations. In particular, let’s revisit the flights data frame by running View(flights) in your console. We see that flights has a rectangular shape with each row corresponding to a different flight and each column corresponding to a characteristic of that flight. This matches exactly with how Hadley Wickham defined tidy data:

          +

          Recall the nycflights13 package with data about all domestic flights departing from New York City in 2013 that we introduced in Section 2.4 and used extensively in Chapter 3 on data visualization and Chapter 4 on data wrangling. Let’s revisit the flights data frame by running View(flights). We saw that flights has a rectangular shape with each of its 336,776 rows corresponding to a flight and each of its 22 columns corresponding to different characteristics/measurements of each flight. This matches exactly with our definition of “tidy” data from above.

          1. Each variable forms a column.
          2. Each observation forms a row.
          3. @@ -956,48 +994,21 @@

            5.2.3 nycflights13 p
          4. Each type of observational unit forms a table.
          -

          Observational units:

          -

          We identified earlier that the observational unit in the flights dataset is an individual flight. And we have shown that this dataset consists of 336,776 flights with 22 variables. In other words, rows of this dataset don’t refer to a measurement on an airline or on an airport; they refer to characteristics/measurements on a given flight from New York City in 2013.

          -

          Also included in the nycflights13 package are datasets with different observational units (Wickham 2018):

          +

          Recall that we also saw in Section 2.4.3 that the observational unit for the flights data frame is an individual flight. In other words, the rows of the flights data frame refer to characteristics/measurements of individual flights. Also included in the nycflights13 package are other data frames with their rows representing different observational units (Wickham 2018):

            -
          • airlines: translation between two letter IATA carrier codes and names (16 in total)
          • -
          • planes: construction information about each of 3,322 planes used
          • -
          • weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports
          • -
          • airports: airport names and locations
          • +
          • airlines: translation between two letter IATA carrier codes and names (16 in total). i.e. the observational unit is an airline company.
          • +
          • planes: construction information about each of 3,322 planes used. i.e. the observational unit is an aircraft.
          • +
          • weather: hourly meteorological data (about 8705 observations) for each of the three NYC airports. i.e. the observational unit is an hourly measurement.
          • +
          • airports: airport names and locations. i.e. the observational unit is an airport.
          -

          The organization of this data follows the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table/data frame. Another example involves a spreadsheet of all students enrolled in a university along with information about them, such as name, gender, and date of birth. Each row represents an individual student, which is the observational unit in question.

          -

          Identification vs measurement variables:

          -

          There is a subtle difference between the kinds of variables that you will encounter in data frames: measurement variables and identification variables. The airports data frame you worked with above contains both these types of variables. Recall that in airports the observational unit is an airport, and thus each row corresponds to one particular airport. Let’s pull them apart using the glimpse function:

          -
          glimpse(airports)
          -
          Observations: 1,458
          -Variables: 8
          -$ faa   <chr> "04G", "06A", "06C", "06N", "09J", "0A9", "0G6", "0G7", "0P2", …
          -$ name  <chr> "Lansdowne Airport", "Moton Field Municipal Airport", "Schaumbu…
          -$ lat   <dbl> 41.1, 32.5, 42.0, 41.4, 31.1, 36.4, 41.5, 42.9, 39.8, 48.1, 39.…
          -$ lon   <dbl> -80.6, -85.7, -88.1, -74.4, -81.4, -82.2, -84.5, -76.8, -76.6, …
          -$ alt   <int> 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 1…
          -$ tz    <dbl> -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5,…
          -$ dst   <chr> "A", "A", "A", "A", "A", "A", "A", "A", "U", "A", "A", "U", "A"…
          -$ tzone <chr> "America/New_York", "America/Chicago", "America/Chicago", "Amer…
          -

          The variables faa and name are what we will call identification variables: variables that uniquely identify each observational unit. They are mainly used to provide a unique name to each observational unit, thereby allowing us to uniquely identify them. faa gives the unique code provided by the FAA for that airport, while the name variable gives the longer more natural name of the airport. The remaining variables (lat, lon, alt, tz, dst, tzone) are often called measurement or characteristic variables: variables that describe properties of each observational unit, in other words each observation in each row. For example, lat and long describe the latitude and longitude of each airport.

          -

          So in our above example of a spreadsheet of all students enrolled at a university, email address could be treated as an identical variable since it uniquely identifies each observational unit i.e. each student, while date of birth could not since it is possible (and highly probable) that two students share the same birthday.

          -

          Furthermore, sometimes a single variable might not be enough to uniquely identify each observational unit: combinations of variables might be needed (see Learning Check below). While it is not an absolute rule, for organizational purposes it is considered good practice to have your identification variables in the far left-most columns of your data frame.

          -
          -

          -Learning check -

          -
          -

          (LC5.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information.

          -

          (LC5.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.

          -
          - -
          +

          The organization of the information into these five data frames follow the third “tidy” data property: observations corresponding to the same observational unit should be saved in the same table i.e. data frame. You could think of this property as the old English expression: “birds of a feather flock together.”


          5.3 Case study: Democracy in Guatemala

          -

          In this section, we’ll show you another example of how to convert a dataset that isn’t in “tidy” format i.e. “wide” format, to a dataset that is in “tidy” format i.e. “long/narrow” format using the gather() function from the tidyr package.. Let’s use the dem_score data frame we imported in Section 5.1, but focus on only data corresponding to the country of Guatemala.

          +

          In this section, we’ll show you another example of how to convert a data frame that isn’t in “tidy” format i.e. “wide” format, to a data frame that is in “tidy” format i.e. “long/narrow” format. We’ll do this using the gather() function from the tidyr package again. Furthermore, we’ll make use of some of the ggplot2 data visualization and dplyr data wrangling tools you learned in Chapters 3 and 4.

          +

          Let’s use the dem_score data frame we imported in Section 5.1, but focus on only data corresponding to Guatemala.

          guat_dem <- dem_score %>% 
             filter(country == "Guatemala")
           guat_dem
          @@ -1005,21 +1016,17 @@

          5.3 Case study: Democracy in Guat country `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992` <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Guatemala 2 -6 -5 3 1 -3 -7 3 3 -

          Now let’s produce a plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Let’s start by laying out how we would map our aesthetics to variables in the data frame:

          -
            -
          • The data frame is guat_dem by setting data = guat_dem
          • -
          -

          What are the names of the variables to plot? We’d like to see how the democracy score has changed over the years. Now we are stuck in a predicament. We see that we have a variable named country but its only value is "Guatemala". We have other variables denoted by different year values. Unfortunately, we’ve run into a dataset that is not in the appropriate format to apply the Grammar of Graphics and ggplot2. Remember that ggplot2 is a package in the tidyverse and, thus, needs data to be in a tidy format. We’d like to finish off our mapping of aesthetics to variables by doing something like

          +

          Now let’s produce a time-series plot showing how the democracy scores have changed over the 40 years from 1952 to 1992 for Guatemala. Recall that we saw time-series plot in Section 3.4 on creating linegraphs using geom_line(). Let’s lay out the Grammar of Graphics we saw in Section 3.1.

          +

          First we know we need to set data = guat_dem and use a geom_line() layer, but what is the aesthetic mapping of variables. We’d like to see how the democracy score has changed over the years, so we need to map:

            -
          • The aesthetic mapping is set by aes(x = year, y = democracy_score)
          • +
          • year to the x-position aesthetic and
          • +
          • democracy_score to the y-position aesthetic
          -

          but this is not possible with our wide-formatted data. We need to take the values of the current column names in guat_dem (aside from country) and convert them into a new variable that will act as a key called year. Then, we’d like to take the numbers on the inside of the table and turn them into a column that will act as values called democracy_score. Our resulting data frame will have three columns: country, year, and democracy_score.

          -

          The gather() function in the tidyr package can complete this task for us. The first argument to gather(), just as with ggplot2(), is the data argument where we specify which data frame we would like to tidy. The next two arguments to gather() are key and value, which specify what we’d like to call the new columns that convert our wide data into long format. Lastly, we include a specification for variables we’d like to NOT include in this tidying process using a -.

          - - -
          guat_tidy <- guat_dem %>% 
          +

          Now we are stuck in a predicament, much like with our drinks_smaller example in Section 5.2. We see that we have a variable named country, but its only value is "Guatemala". We have other variables denoted by different year values. Unfortunately, the guat_dem data frame is not “tidy” and hence is not in the appropriate format to apply the Grammar of Graphics and thus we cannot use the ggplot2 package. We need to take the values of the columns corresponding to years in guat_dem and convert them into a new “key” variable called year. Furthermore, we’d like to take the democracy scores on the inside of the table and turn them into a new “value” variable called democracy_score. Our resulting data frame will thus have three columns: country, year, and democracy_score.

          +

          Recall that the gather() function in the tidyr package can complete this task for us:

          +
          guat_dem_tidy <- guat_dem %>% 
             gather(key = year, value = democracy_score, -country) 
          -guat_tidy
          +guat_dem_tidy
          # A tibble: 9 x 3
             country   year  democracy_score
             <chr>     <chr>           <dbl>
          @@ -1032,30 +1039,24 @@ 

          5.3 Case study: Democracy in Guat 7 Guatemala 1982 -7 8 Guatemala 1987 3 9 Guatemala 1992 3

          -

          We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a linegraph and ggplot2.

          -
          ggplot(guat_tidy, aes(x = year, y = democracy_score)) +
          -  geom_line()
          -
          geom_path: Each group consists of only one observation. Do you need to adjust
          -the group aesthetic?
          -

          - -

          Observe that the year variable in guat_tidy is stored as a character vector since we had to circumvent the naming rules in R by adding backticks around the different year columns in guat_dem. This is leading to ggplot not knowing exactly how to plot a line using a categorical variable. We can fix this by using the parse_number() function in the readr package and then specify the horizontal axis label to be "year":

          -
          ggplot(guat_tidy, aes(x = parse_number(year), y = democracy_score)) +
          +

          We set the arguments to gather() as follows:

          +
            +
          1. key is the name of the column/variable in the new “tidy” frame that contains the column names of the original data frame that you want to tidy. Observe how we set key = year and in the resulting guat_dem_tidy the column year contains the years where the Guatemala’s democracy score were measured.
          2. +
          3. value is the name of the column/variable in the “tidy” frame that contains the rows and columns of values in the original data frame you want to tidy. Observe how we set value = democracy_score and in the resulting guat_dem_tidy the column democracy_score contains the 1 \(\times\) 9 = 9 democracy scores.
          4. +
          5. The third argument are the columns you either want to or don’t want to tidy. Observe how we set this to -country indicating that we don’t want to tidy the country variable in guat_dem and rather only 1952 through 1992.
          6. +
          + +

          However, observe in the output for guat_dem_tidy that the year variable is of type chr or character. Before we can plot this variable on the x-axis, we need to convert it into a numerical variable using the as.numeric() function within the mutate() function, which we saw in Section 4.5 on mutating existing variables to create new ones.

          +
          guat_dem_tidy <- guat_dem_tidy %>% 
          +  mutate(year = as.numeric(year))
          +

          We can now create the plot to show how the democracy score of Guatemala changed from 1952 to 1992 using a geom_line():

          +
          ggplot(guat_dem_tidy, aes(x = year, y = democracy_score)) +
             geom_line() +
          -  labs(x = "year")
          -
          -Guatemala's democracy score ratings from 1952 to 1992 -

          -FIGURE 5.3: Guatemala’s democracy score ratings from 1952 to 1992 -

          -
          -

          We’ll see in Chapter 4 how we could use the mutate() function to change year to be a numeric variable instead after we have done our tidying. Notice now that the mappings of aesthetics to variables make sense in Figure 5.3:

          -
            -
          • The data frame is guat_tidy by setting data = dem_score
          • -
          • The x aesthetic is mapped to year
          • -
          • The y aesthetic is mapped to democracy_score
          • -
          • The geom_etry chosen is line
          • -
          + labs(x = "Year", y = "Democracy Score", title = "Democracy score in Guatemala 1952-1992")
          +

          Learning check @@ -1072,12 +1073,12 @@

          5.3 Case study: Democracy in Guat

          5.4 Conclusion

          5.4.1 tidyverse package

          -

          Notice at the beginning of the Chapter we loaded the following four packages:

          +

          Notice at the beginning of the chapter we loaded the following four packages, which are among the four of the most frequently used R packages for data science:

          library(dplyr)
           library(ggplot2)
           library(readr)
           library(tidyr)
          -

          In fact, these are among the four of the most frequently used R packages for data science. There is a much quicker way to load these packages than by individually loading them as we did above. We can install and load the tidyverse package. The tidyverse package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So that after installing the tidyverse package as you would a normal package, running this:

          +

          There is a much quicker way to load these packages than by individually loading them as we did above: by installing and loading the tidyverse package. The tidyverse package acts as an “umbrella” package whereby installing/loading it will install/load multiple packages at once for you. So after installing the tidyverse package as you would a normal package, running this:

          library(tidyverse)

          would be the same as running this:

          library(ggplot2)
          @@ -1089,36 +1090,16 @@ 

          5.4.1 tidyverse pack library(stringr) library(forcats)

          You’ve seen the first 4 of the these packages: ggplot2 for data visualization, dplyr for data wrangling, tidyr for converting data to “tidy” format, and readr for importing spreadsheet data into R. The remaining packages (purrr, tibble, stringr, and forcats) are left for a more advanced book; check out R for Data Science to learn about these packages.

          -

          The tidyverse “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This acts as a standardization to make transitions between the various functions in these packages as seamless as possible.

          -
          -
          -

          5.4.2 Optional: Normal forms of data

          -

          The datasets included in the nycflights13 package are in a form that minimizes redundancy of data. We will see that there are ways to merge (or join) the different tables together easily. We are capable of doing so because each of the tables have keys in common to relate one to another. This is an important property of normal forms of data. The process of decomposing data frames into less redundant tables without losing information is called normalization. More information is available on Wikipedia.

          -

          We saw an example of this above with the airlines dataset. While the flights data frame could also include a column with the names of the airlines instead of the carrier code, this would be repetitive since there is a unique mapping of the carrier code to the name of the airline/carrier.

          -

          Below an example is given showing how to join the airlines data frame together with the flights data frame by linking together the two datasets via a common key of "carrier". Note that this “joined” data frame is assigned to a new data frame called joined_flights. The key variable that we frequently join by is one of the identification variables mentioned above.

          -
          joined_flights <- inner_join(x = flights, y = airlines, by = "carrier")
          -
          View(joined_flights)
          -

          If we View() this dataset, we see a new variable has been created called name. (We will see in Subsection 4.8.2 ways to change name to a more descriptive variable name.) More discussion about joining data frames together will be given in Chapter 4. We will see there that the names of the columns to be linked need not match as they did here with "carrier".

          -
          -

          -Learning check -

          -
          -

          (LC5.6) What are common characteristics of “tidy” datasets?

          -

          (LC5.7) What makes “tidy” datasets useful for organizing data?

          -

          (LC5.8) What are some advantages of data in normal forms? What are some disadvantages?

          -
          - -
          +

          The tidyverse “umbrella” package gets its name from the fact that all functions in all its constituent packages are designed to that all inputs/argument data frames are in “tidy” format and all output data frames are in “tidy” format as well. This standardization of input and output data frames makes transitions between the various functions in these packages as seamless as possible.

          -

          5.4.3 Additional resources

          +

          5.4.2 Additional resources

          An R script file of all R code used in this chapter is available here.

          If you want to learn more about using the readr and tidyr package, we suggest you that you check out RStudio’s “Data Import” cheatsheet. You can access this cheatsheet by going to RStudio’s cheatsheet page and searching for “Data Import Cheat Sheet”.

          Data Import cheatsheat

          -FIGURE 5.4: Data Import cheatsheat +FIGURE 5.3: Data Import cheatsheat

          -

          5.4.4 What’s to come?

          -

          Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in datasets. However, we’re going to leave the Chapter 11 on “Inference for Regression” until after we’ve covered statistical inference.

          -
          +

          5.4.3 What’s to come?

          +

          Congratulations! We’ve completed the “Data Science via the tidyverse” portion of this book! We’ll now move to the “data modeling” portion in Chapters 6 and 7, where you’ll leverage your data visualization and wrangling skills to model relationships between different variables in data frames. However, we’re going to leave the Chapter 11 on “Inference for Regression” until after we’ve covered statistical inference.

          +
          ModernDive flowchart - On to Part II!

          -FIGURE 5.5: ModernDive flowchart - On to Part II! +FIGURE 5.4: ModernDive flowchart - On to Part II!

          diff --git a/docs/6-appendixD.html b/docs/6-appendixD.html deleted file mode 100644 index 7afc054fe..000000000 --- a/docs/6-appendixD.html +++ /dev/null @@ -1,2194 +0,0 @@ - - - - - - - - Statistical Inference via Data Science in R - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          - -
          - -
          - -
          -
          - - -
          -
          - -
          - -ModernDive - -
          -

          Chapter 6 Learning Check Solutions

          -
          -

          6.1 Chapter 2 Solutions

          -
          library(dplyr)
          -library(ggplot2)
          -library(nycflights13)
          -

          (LC2.1) What does any ONE row in this flights dataset refer to?

          -
            -
          • A. Data on an airline
          • -
          • B. Data on a flight
          • -
          • C. Data on an airport
          • -
          • D. Data on multiple flights
          • -
          -

          Solution: This is data on a flight. Not a flight path! Example:

          -
            -
          • a flight path would be United 1545 to Houston
          • -
          • a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.
          • -
          -

          (LC2.2) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables?

          -

          Solution: Hint: Type ?flights in the console to see what all the variables mean!

          -
            -
          • Categorical: -
              -
            • carrier the company
            • -
            • dest the destination
            • -
            • flight the flight number. Even though this is a number, its simply a label. Example United 1545 is not less than United 1714
            • -
          • -
          • Quantitative: -
              -
            • distance the distance in miles
            • -
            • time_hour time
            • -
          • -
          -

          (LC2.3) What does int, dbl, and chr mean in the output above?

          -

          Solution:

          -
            -
          • int: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot
          • -
          • dbl: double. Used to measure things. i.e. a continuous value. Ex: your height in inches
          • -
          • chr: character. i.e. text
          • -
          -
          -
          -
          -

          6.2 Chapter 3 Solutions

          -
          library(nycflights13)
          -library(ggplot2)
          -library(dplyr)
          -

          (LC3.1) Take a look at both the flights and alaska_flights data frames by running View(flights) and View(alaska_flights) in the console. In what respect do these data frames differ?

          -

          Solution: flights contains all flight data, while alaska_flights contains only data from Alaskan carrier “AS”. We can see that flights has 336776 rows while alaska_flights has only 714

          -

          (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship?

          -

          Solution: The later a plane departs, typically the later it will arrive.

          -

          (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here.

          -

          Solution: An example in the weather dataset is visibility, which measure visibility in miles. As visibility increases, we would expect departure delays to decrease.

          -

          (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?

          -

          Solution: The point (0,0) means no delay in departure nor arrival. From the point of view of Alaska airlines, this means the flight was on time. It seems most flights are at least close to being on time.

          -

          (LC3.5) What are some other features of the plot that stand out to you?

          -

          Solution: Different people will answer this one differently. One answer is most flights depart and arrive less than an hour late.

          -

          (LC3.6) Create a new scatterplot using different variables in the alaska_flights data frame by modifying the example above.

          -

          Solution: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.

          -
          ggplot(data = alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) +
          -  geom_point()
          -

          -

          (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?

          -

          Solution: Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? It thins out the points so we address overplotting. But more importantly it hints at the (statistical) density and distribution of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.

          -

          (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2?

          -

          Solution: After viewing the Figure 3.4 above, give a range of arrival delays and departure delays that occur most frequently? How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2? The lower plot suggests that most Alaska flights from NYC depart between 12 minutes early and on time and arrive between 50 minutes early and on time.

          -

          (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ?

          -

          Solution: Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ? The rows of early_january_weather are a subset of weather.

          -

          (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not?

          -

          Solution: View() the flights data frame again. Why does the time_hour variable correctly identify the hour of the measurement whereas the hour variable does not? Because to uniquely identify an hour, we need the year/month/day/hour sequence, whereas there are only 24 possible hour’s.

          -

          (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?

          -

          Solution: Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis? Because lines suggest connectedness and ordering.

          -

          (LC3.12) Why are linegraphs frequently used when time is the explanatory variable?

          -

          Solution: Why are linegraphs frequently used when time is the explanatory variable? Because time is sequential: subsequent observations are closely related to each other.

          -

          (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013.

          -

          Solution: Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Humidity is a good one to look at, since this very closely related to the cycles of a day.

          -
          ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) +
          -  geom_line()
          -

          -

          (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?

          -

          Solution: The distribution doesn’t change much. But by refining the bid width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the temp variabile by View(weather), we see that the precision of each temperature recording is 2 decimal places.

          -

          (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed?

          -

          Solution: It is rather symmetric, i.e. there are no long tails on only one side of the distribution

          -

          (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice?

          -

          Solution: The center is around 55.2603921°F. By running the summary() command, we see that the mean and median are very similar. In fact, when the distribution is symmetric the mean equals the median.

          -

          (LC3.17) Is this data spread out greatly from the center or is it close? Why?

          -

          Solution: This can only be answered relatively speaking! Let’s pick things to be relative to Seattle, WA temperatures:

          -
          - - -
          -

          While, it appears that Seattle weather has a similar center of 55°F, its temperatures are almost entirely between 35°F and 75°F for a range of about 40°F. Seattle temperatures are much less spread out than New York i.e. much more consistent over the year. New York on the other hand has much colder days in the winter and much hotter days in the summer. Expressed differently, the middle 50% of values, as delineated by the interquartile range is 30°F:

          -

          (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?

          -

          Solution:

          -
            -
          • Certain months have much more consistent weather (August in particular), while others have crazy variability like January and October, representing changes in the seasons.
          • -
          • The two variables we are see the relationship of are temp and month.
          • -
          -

          (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?

          -

          Solution:

          -
            -
          • While month is technically a number between 1-12, we’re viewing it as a categorical variable here. Specifically an ordinal categorical variable since there is a ordering to the categories
          • -
          • 25, 50, 75, 100 are temperatures
          • -
          -

          (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.

          -

          Solution:

          -
            -
          • We’d have 365 facets to look at. Way to many.
          • -
          • We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends.
          • -
          -

          (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that?

          -

          Solution: Again, like in LC (LC3.17), this is a relative question. I would say yes, because in New York City, you have 4 clear seasons with different weather. Whereas in Seattle WA and Portland OR, you have two seasons: summer and rain!

          -

          (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.

          -

          Solution: It appears to be an outlier. Let’s revisit the use of the filter command to hone in on it. We want all data points where the month is 5 and temp<25

          -
          weather %>% 
          -  filter(month==5 & temp < 25)
          -
          # A tibble: 1 x 15
          -  origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
          -  <chr>  <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
          -1 JFK     2013     5     8    22  13.1  12.0  95.3       80       8.06        NA
          -# ... with 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
          -#   time_hour <dttm>
          -

          There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)?

          -

          (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is?

          -

          Solution: We are now interested in the spread of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):

          -
            -
          • The distance from the 1st to the 3rd quartiles i.e. the length of the boxes
          • -
          • You can also think of this as the spread of the middle 50% of the data
          • -
          -

          Just from eyeballing it, it seems

          -
            -
          • November has the biggest IQR, i.e. the widest box, so has the most variation in temperature
          • -
          • August has the smallest IQR, i.e. the narrowest box, so is the most consistent temperature-wise
          • -
          -

          Here’s how we compute the exact IQR values for each month (we’ll see this more in depth Chapter 5 of the text):

          -
            -
          1. group the observations by month then
          2. -
          3. for each group, i.e. month, summarize it by applying the summary statistic function IQR(), while making sure to skip over missing data via na.rm=TRUE then
          4. -
          5. arrange the table in descending order of IQR
          6. -
          -
          weather %>%
          -  group_by(month) %>%
          -  summarize(IQR = IQR(temp, na.rm=TRUE)) %>%
          -  arrange(desc(IQR))
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -month - -IQR -
          -11 - -16.02 -
          -12 - -14.04 -
          -1 - -13.77 -
          -9 - -12.06 -
          -4 - -12.06 -
          -5 - -11.88 -
          -6 - -10.98 -
          -10 - -10.98 -
          -2 - -10.08 -
          -7 - -9.18 -
          -3 - -9.00 -
          -8 - -7.02 -
          -

          (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?

          -

          Solution: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots.

          -

          (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?

          -

          Solution: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.

          -

          (LC3.26) Why are histograms inappropriate for visualizing categorical variables?

          -

          Solution: Histograms are for numerical variables i.e. the horizontal part of each histogram bar represents an interval, whereas for a categorical variable each bar represents only one level of the categorical variable.

          -

          (LC3.27) What is the difference between histograms and barplots?

          -

          Solution: See above.

          -

          (LC3.28) How many Envoy Air flights departed NYC in 2013?

          -

          Solution: Envoy Air is carrier code MQ and thus 26397 flights departed NYC in 2013.

          -

          (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?

          -

          Solution: What a pain! We’ll see in Chapter 5 on Data Wrangling that applying arrange(desc(n)) will sort this table in descending order of n!

          -

          (LC3.30) Why should pie charts be avoided and replaced by barplots?

          -

          Solution: In our opinion, comparisons using horizontal lines are easier than comparing angles and areas of circles.

          -

          (LC3.31) What is your opinion as to why pie charts continue to be used?

          -

          Solution: Legacy?

          -

          (LC3.32) What kinds of questions are not easily answered by looking at the above figure?

          -

          Solution: Because the red, green, and blue bars don’t all start at 0 (only red does), it makes comparing counts hard.

          -

          (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?

          -

          Solution: The different airlines prefer different airports. For example, United is mostly a Newark carrier and JetBlue is a JFK carrier. If airlines didn’t prefer airports, each color would be roughly one third of each bar.}

          -

          (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?

          -

          Solution: We can easily compare the different aiports for a given carrier using a single comparison line i.e. things are lined up

          -

          (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?

          -

          Solution: It is hard to get totals for each airline.

          -

          (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?

          -

          Solution: Not that different than using side-by-side; depends on how you want to organize your presentation.

          -

          (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot?

          -

          Solution: Now we can also compare the different carriers within a particular airport easily too. For example, we can read off who the top carrier for each airport is easily using a single horizontal line.

          -
          -
          -
          -

          6.3 Chapter 4 Solutions

          -
          library(dplyr)
          -library(ggplot2)
          -library(nycflights13)
          -library(tidyr)
          -library(readr)
          -

          (LC4.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?

          -
          # A tibble: 3 x 4
          -  country     beer_servings spirit_servings wine_servings
          -  <chr>               <int>           <int>         <int>
          -1 Canada                240             122           100
          -2 South Korea           140              16             9
          -3 USA                   249             158            84
          -

          This data frame is not in tidy format. What would it look like if it were?

          -

          Solution: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.

          -
          # A tibble: 9 x 3
          -  country     `alcohol type` servings
          -  <chr>       <chr>             <int>
          -1 Canada      beer                240
          -2 Canada      spirit              122
          -3 Canada      wine                100
          -4 South Korea beer                140
          -5 South Korea spirit               16
          -6 South Korea wine                  9
          -7 USA         beer                249
          -8 USA         spirit              158
          -9 USA         wine                 84
          -

          Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.

          -
          # A tibble: 9 x 3
          -  country     `alcohol type` servings
          -  <chr>       <chr>             <int>
          -1 Canada      beer                240
          -2 South Korea beer                140
          -3 USA         beer                249
          -4 Canada      spirit              122
          -5 South Korea spirit               16
          -6 USA         spirit              158
          -7 Canada      wine                100
          -8 South Korea wine                  9
          -9 USA         wine                 84
          -

          (LC4.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information.

          -

          Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %>% filter(faa == "DEN") to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label.

          -

          (LC4.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.

          -

          Solution:

          -
            -
          • In the weather example in LC3.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question.
          • -
          • Anything else pertains to observations: temp, humid, wind_speed, etc.
          • -
          -

          (LC4.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame.

          -

          Solution: Running the following in the console:

          -
          dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country)
          -

          Let’s now compare the dem_score and dem_score_tidy. dem_score has democracy score information for each year in columns, whereas in dem_score_tidy there are explicit variables year and democracy_score. While both representations of the data contain the same information, we can only use ggplot() to create plots using the dem_score_tidy data frame.

          -
          dem_score
          -
          # A tibble: 96 x 10
          -   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
          -   <chr>       <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
          - 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
          - 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
          - 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
          - 4 Australia      10     10     10     10     10     10     10     10     10
          - 5 Austria        10     10     10     10     10     10     10     10     10
          - 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
          - 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
          - 8 Belgium        10     10     10     10     10     10     10     10     10
          - 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
          -10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
          -# ... with 86 more rows
          -
          dem_score_tidy
          -
          # A tibble: 864 x 3
          -   country    year  democracy_score
          -   <chr>      <chr>           <int>
          - 1 Albania    1952               -9
          - 2 Argentina  1952               -9
          - 3 Armenia    1952               -9
          - 4 Australia  1952               10
          - 5 Austria    1952               10
          - 6 Azerbaijan 1952               -9
          - 7 Belarus    1952               -9
          - 8 Belgium    1952               10
          - 9 Bhutan     1952              -10
          -10 Bolivia    1952               -4
          -# ... with 854 more rows
          -

          (LC4.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame.

          -

          Solution: The code is similar

          -
          life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv')
          -life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
          -

          We observe the same construct structure with respect to year in life_expectancy vs life_expectancy_tidy as we did in dem_score vs dem_score_tidy:

          -
          life_expectancy
          -
          # A tibble: 202 x 67
          -   country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960`
          -   <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
          - 1 Afghan…   27.1   27.7   28.2   28.7   29.3   29.8   30.3   30.9   31.4   31.9
          - 2 Albania   54.7   55.2   55.8   56.6   57.4   58.4   59.5   60.6   61.8   62.9
          - 3 Algeria   43.0   43.5   44.0   44.4   44.9   45.4   45.9   46.4   47.0   47.5
          - 4 Angola    31.0   31.6   32.1   32.7   33.2   33.8   34.3   34.9   35.4   36.0
          - 5 Antigu…   58.3   58.8   59.3   59.9   60.4   60.9   61.4   62.0   62.5   63.0
          - 6 Argent…   61.9   62.5   63.1   63.6   64.0   64.4   64.7   65     65.2   65.4
          - 7 Armenia   62.7   63.1   63.6   64.1   64.5   65     65.4   65.9   66.4   66.9
          - 8 Aruba     59.0   60.0   61.0   61.9   62.7   63.4   64.1   64.7   65.2   65.7
          - 9 Austra…   68.7   69.1   69.7   69.8   70.2   70.0   70.3   70.9   70.4   70.9
          -10 Austria   65.2   66.8   67.3   67.3   67.6   67.7   67.5   68.5   68.4   68.8
          -# ... with 192 more rows, and 56 more variables: `1961` <dbl>, `1962` <dbl>,
          -#   `1963` <dbl>, `1964` <dbl>, `1965` <dbl>, `1966` <dbl>, `1967` <dbl>,
          -#   `1968` <dbl>, `1969` <dbl>, `1970` <dbl>, `1971` <dbl>, `1972` <dbl>,
          -#   `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>,
          -#   `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>,
          -#   `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
          -#   `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>,
          -#   `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
          -#   `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
          -#   `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
          -#   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
          -#   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>
          -
          life_expectancy_tidy
          -
          # A tibble: 13,332 x 3
          -   country             year  life_expectancy
          -   <chr>               <chr>           <dbl>
          - 1 Afghanistan         1951             27.1
          - 2 Albania             1951             54.7
          - 3 Algeria             1951             43.0
          - 4 Angola              1951             31.0
          - 5 Antigua and Barbuda 1951             58.3
          - 6 Argentina           1951             61.9
          - 7 Armenia             1951             62.7
          - 8 Aruba               1951             59.0
          - 9 Australia           1951             68.7
          -10 Austria             1951             65.2
          -# ... with 13,322 more rows
          -

          (LC4.6) What are common characteristics of “tidy” datasets?

          -

          Solution: Rows correspond to observations, while columns correspond to variables.

          -

          (LC4.7) What makes “tidy” datasets useful for organizing data?

          -

          Solution: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the ggplot2 and dplyr packages for data visualization and wrangling.

          -

          (LC4.8) What are some advantages of data in normal forms? What are some disadvantages?

          -

          Solution: When datasets are in normal form, we can easily _join them with other datasets! For example, can we join the flights data with the planes data? We’ll see this more in Chapter 5!

          -
          -
          -
          -

          6.4 Chapter 5 Solutions

          -
          library(dplyr)
          -library(ggplot2)
          -library(nycflights13)
          -

          (LC5.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above.

          -

          Solution:

          -
          # Original in book
          -not_BTV_SEA <- flights %>% 
          -  filter(!(dest == "BTV" | dest == "SEA"))
          -
          -# Alternative way
          -not_BTV_SEA <- flights %>% 
          -  filter(!dest == "BTV" & !dest == "SEA")
          -
          -# Yet another way
          -not_BTV_SEA <- flights %>% 
          -  filter(dest != "BTV" & dest != "SEA")
          -

          (LC5.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?

          -

          Solution: The missing patients may have died of lung cancer! So to ignore them might seriously bias your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself:

          -
            -
          • There is a systematic reasons why certain values are missing? If so, you might be biasing your results!
          • -
          • If there isn’t, then it might be ok to “sweep missing values under the rug.”
          • -
          -

          (LC5.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to?

          -

          Solution: It corresponds to a count of the number of observations/rows:

          -
          weather %>% 
          -  summarize(count = n())
          -
          # A tibble: 1 x 1
          -  count
          -  <int>
          -1 26115
          -

          (LC5.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp <- weather %>% summarize(mean = mean(temp, na.rm = TRUE)) first.

          -
          summary_temp <- weather %>%   
          -  summarize(mean = mean(temp, na.rm = TRUE)) %>% 
          -  summarize(std_dev = sd(temp, na.rm = TRUE))
          -

          Solution: Consider the output of only running the first two lines:

          -
          weather %>%   
          -  summarize(mean = mean(temp, na.rm = TRUE))
          -
          # A tibble: 1 x 1
          -   mean
          -  <dbl>
          -1  55.3
          -

          Because after the first summarize(), the variable temp disappears as it has been collapsed to the value mean. So when we try to run the second summarize(), it can’t find the variable temp to compute the standard deviation of.

          -

          (LC5.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year?

          -

          Solution:

          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -month - -mean - -std_dev -
          -1 - -35.63566 - -10.224635 -
          -2 - -34.27060 - -6.982378 -
          -3 - -39.88007 - -6.249278 -
          -4 - -51.74564 - -8.786168 -
          -5 - -61.79500 - -9.681644 -
          -6 - -72.18400 - -7.546371 -
          -7 - -80.06622 - -7.119898 -
          -8 - -74.46847 - -5.191615 -
          -9 - -67.37129 - -8.465902 -
          -10 - -60.07113 - -8.846035 -
          -11 - -44.99043 - -10.443805 -
          -12 - -38.44180 - -9.982432 -
          -

          The standard deviation is a quantification of spread and variability. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days.

          -

          (LC5.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?

          -

          Solution:

          -
          summary_temp_by_day <- weather %>% 
          -  group_by(year, month, day) %>% 
          -  summarize(
          -          mean = mean(temp, na.rm = TRUE),
          -          std_dev = sd(temp, na.rm = TRUE)
          -          )
          -summary_temp_by_day
          -
          # A tibble: 364 x 5
          -# Groups:   year, month [?]
          -    year month   day  mean std_dev
          -   <dbl> <dbl> <int> <dbl>   <dbl>
          - 1  2013     1     1  37.0    4.00
          - 2  2013     1     2  28.7    3.45
          - 3  2013     1     3  30.0    2.58
          - 4  2013     1     4  34.9    2.45
          - 5  2013     1     5  37.2    4.01
          - 6  2013     1     6  40.1    4.40
          - 7  2013     1     7  40.6    3.68
          - 8  2013     1     8  40.1    5.77
          - 9  2013     1     9  43.2    5.40
          -10  2013     1    10  43.8    2.95
          -# ... with 354 more rows
          -

          Note: group_by(day) is not enough, because day is a value between 1-31. We need to group_by(year, month, day)

          -
          library(dplyr)
          -library(nycflights13)
          -
          -summary_temp_by_month <- weather %>% 
          -  group_by(month) %>% 
          -  summarize(
          -          mean = mean(temp, na.rm = TRUE),
          -          std_dev = sd(temp, na.rm = TRUE)
          -          )
          -

          (LC5.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset?

          -

          Solution:

          -
          by_monthly_origin <- flights %>% 
          -  group_by(month, origin) %>% 
          -  summarize(count = n())
          -
          by_monthly_origin
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -month - -origin - -count -
          -1 - -EWR - -9893 -
          -1 - -JFK - -9161 -
          -1 - -LGA - -7950 -
          -2 - -EWR - -9107 -
          -2 - -JFK - -8421 -
          -2 - -LGA - -7423 -
          -3 - -EWR - -10420 -
          -3 - -JFK - -9697 -
          -3 - -LGA - -8717 -
          -4 - -EWR - -10531 -
          -4 - -JFK - -9218 -
          -4 - -LGA - -8581 -
          -5 - -EWR - -10592 -
          -5 - -JFK - -9397 -
          -5 - -LGA - -8807 -
          -6 - -EWR - -10175 -
          -6 - -JFK - -9472 -
          -6 - -LGA - -8596 -
          -7 - -EWR - -10475 -
          -7 - -JFK - -10023 -
          -7 - -LGA - -8927 -
          -8 - -EWR - -10359 -
          -8 - -JFK - -9983 -
          -8 - -LGA - -8985 -
          -9 - -EWR - -9550 -
          -9 - -JFK - -8908 -
          -9 - -LGA - -9116 -
          -10 - -EWR - -10104 -
          -10 - -JFK - -9143 -
          -10 - -LGA - -9642 -
          -11 - -EWR - -9707 -
          -11 - -JFK - -8710 -
          -11 - -LGA - -8851 -
          -12 - -EWR - -9922 -
          -12 - -JFK - -9146 -
          -12 - -LGA - -9067 -
          -

          The difference is they are organized/sorted by month first, then origin

          -

          (LC5.8) How could we identify how many flights left each of the three airports for each carrier?

          -

          Solution: We could summarize the count from each airport using the n() function, which counts rows.

          -
          count_flights_by_airport <- flights %>% 
          -  group_by(origin, carrier) %>% 
          -  summarize(count=n())
          -
          count_flights_by_airport
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -origin - -carrier - -count -
          -EWR - -9E - -1268 -
          -EWR - -AA - -3487 -
          -EWR - -AS - -714 -
          -EWR - -B6 - -6557 -
          -EWR - -DL - -4342 -
          -EWR - -EV - -43939 -
          -EWR - -MQ - -2276 -
          -EWR - -OO - -6 -
          -EWR - -UA - -46087 -
          -EWR - -US - -4405 -
          -EWR - -VX - -1566 -
          -EWR - -WN - -6188 -
          -JFK - -9E - -14651 -
          -JFK - -AA - -13783 -
          -JFK - -B6 - -42076 -
          -JFK - -DL - -20701 -
          -JFK - -EV - -1408 -
          -JFK - -HA - -342 -
          -JFK - -MQ - -7193 -
          -JFK - -UA - -4534 -
          -JFK - -US - -2995 -
          -JFK - -VX - -3596 -
          -LGA - -9E - -2541 -
          -LGA - -AA - -15459 -
          -LGA - -B6 - -6002 -
          -LGA - -DL - -23067 -
          -LGA - -EV - -8826 -
          -LGA - -F9 - -685 -
          -LGA - -FL - -3260 -
          -LGA - -MQ - -16928 -
          -LGA - -OO - -26 -
          -LGA - -UA - -8044 -
          -LGA - -US - -13136 -
          -LGA - -WN - -6087 -
          -LGA - -YV - -601 -
          -

          All remarkably similar! Note: the n() function counts rows, whereas the sum(VARIABLE_NAME) funciton sums all values of a certain numerical variable VARIABLE_NAME.

          -

          (LC5.9) How does the filter operation differ from a group_by followed by a summarize?

          -

          Solution:

          -
            -
          • filter picks out rows from the original dataset without modifying them, whereas
          • -
          • group_by %>% summarize computes summaries of numerical variables, and hence reports new values.
          • -
          -

          (LC5.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value?

          -

          Solution:

          -
            -
          • Say a flight departed 20 minutes late, i.e. dep_delay = 20
          • -
          • Then arrived 10 minutes late, i.e. arr_delay = 10.
          • -
          • Then gain = dep_delay - arr_delay = 20 - 10 = 10 is positive, so it “made up/gained time in the air.”
          • -
          • 0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the gain is near 0 minutes.
          • -
          • I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with?
          • -
          -

          (LC5.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights.

          -

          Solution: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but 1203-1159 = 44

          -

          (LC5.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values.

          -

          Solution: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however!

          -

          (LC5.13) Looking at Figure 5.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour?

          -

          Solution: Because hour is simply a value between 0 and 23; to identify a specific hour, we need to know which year, month, day and at which airport.

          -

          (LC5.14) What surprises you about the top 10 destinations from NYC in 2013?

          -

          Solution: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train?

          -

          (LC5.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways.

          -

          Solution:

          -
          # The regular way:
          -flights %>% 
          -  select(dest, air_time, distance)
          -
          # A tibble: 336,776 x 3
          -   dest  air_time distance
          -   <chr>    <dbl>    <dbl>
          - 1 IAH        227     1400
          - 2 IAH        227     1416
          - 3 MIA        160     1089
          - 4 BQN        183     1576
          - 5 ATL        116      762
          - 6 ORD        150      719
          - 7 FLL        158     1065
          - 8 IAD         53      229
          - 9 MCO        140      944
          -10 ORD        138      733
          -# ... with 336,766 more rows
          -
          # Since they are sequential columns in the dataset
          -flights %>% 
          -  select(dest:distance)
          -
          # A tibble: 336,776 x 3
          -   dest  air_time distance
          -   <chr>    <dbl>    <dbl>
          - 1 IAH        227     1400
          - 2 IAH        227     1416
          - 3 MIA        160     1089
          - 4 BQN        183     1576
          - 5 ATL        116      762
          - 6 ORD        150      719
          - 7 FLL        158     1065
          - 8 IAD         53      229
          - 9 MCO        140      944
          -10 ORD        138      733
          -# ... with 336,766 more rows
          -
          # Not as effective, by removing everything else
          -flights %>% 
          -  select(-year, -month, -day, -dep_time, -sched_dep_time, -dep_delay, -arr_time,
          -         -sched_arr_time, -arr_delay, -carrier, -flight, -tailnum, -origin, 
          -         -hour, -minute, -time_hour)
          -
          # A tibble: 336,776 x 6
          -   dest  air_time distance  gain hours gain_per_hour
          -   <chr>    <dbl>    <dbl> <dbl> <dbl>         <dbl>
          - 1 IAH        227     1400    -9 3.78          -2.38
          - 2 IAH        227     1416   -16 3.78          -4.23
          - 3 MIA        160     1089   -31 2.67         -11.6 
          - 4 BQN        183     1576    17 3.05           5.57
          - 5 ATL        116      762    19 1.93           9.83
          - 6 ORD        150      719   -16 2.5           -6.4 
          - 7 FLL        158     1065   -24 2.63          -9.11
          - 8 IAD         53      229    11 0.883         12.5 
          - 9 MCO        140      944     5 2.33           2.14
          -10 ORD        138      733   -10 2.3           -4.35
          -# ... with 336,766 more rows
          -

          (LC5.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains.

          -

          Solution:

          -
          # Anything that starts with "d"
          -flights %>% 
          -  select(starts_with("d"))
          -
          # A tibble: 336,776 x 5
          -     day dep_time dep_delay dest  distance
          -   <int>    <int>     <dbl> <chr>    <dbl>
          - 1     1      517         2 IAH       1400
          - 2     1      533         4 IAH       1416
          - 3     1      542         2 MIA       1089
          - 4     1      544        -1 BQN       1576
          - 5     1      554        -6 ATL        762
          - 6     1      554        -4 ORD        719
          - 7     1      555        -5 FLL       1065
          - 8     1      557        -3 IAD        229
          - 9     1      557        -3 MCO        944
          -10     1      558        -2 ORD        733
          -# ... with 336,766 more rows
          -
          # Anything related to delays:
          -flights %>% 
          -  select(ends_with("delay"))
          -
          # A tibble: 336,776 x 2
          -   dep_delay arr_delay
          -       <dbl>     <dbl>
          - 1         2        11
          - 2         4        20
          - 3         2        33
          - 4        -1       -18
          - 5        -6       -25
          - 6        -4        12
          - 7        -5        19
          - 8        -3       -14
          - 9        -3        -8
          -10        -2         8
          -# ... with 336,766 more rows
          -
          # Anything related to departures:
          -flights %>% 
          -  select(contains("dep"))
          -
          # A tibble: 336,776 x 3
          -   dep_time sched_dep_time dep_delay
          -      <int>          <int>     <dbl>
          - 1      517            515         2
          - 2      533            529         4
          - 3      542            540         2
          - 4      544            545        -1
          - 5      554            600        -6
          - 6      554            558        -4
          - 7      555            600        -5
          - 8      557            600        -3
          - 9      557            600        -3
          -10      558            600        -2
          -# ... with 336,766 more rows
          -

          (LC5.17) Why might we want to use the select() function on a data frame?

          -

          Solution: To narrow down the data frame, to make it easier to look at. Using View() for example.

          -

          (LC5.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

          -

          Solution:

          -
          top_five <- flights %>% 
          -  group_by(dest) %>% 
          -  summarize(avg_delay = mean(arr_delay, na.rm = TRUE)) %>% 
          -  arrange(desc(avg_delay)) %>% 
          -  top_n(n = 5)
          -
          Selecting by avg_delay
          -
          top_five
          -
          # A tibble: 5 x 2
          -  dest  avg_delay
          -  <chr>     <dbl>
          -1 CAE        41.8
          -2 TUL        33.7
          -3 OKC        30.6
          -4 JAC        28.1
          -5 TYS        24.1
          -

          (LC5.19) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:

          -
            -
          1. Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code).
          2. -
          3. Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles.
          4. -
          5. Figure 5.7 above showing how the various datasets can be joined will also be useful.
          6. -
          7. Consider the data wrangling verbs in Table 5.1 as your toolbox!
          8. -
          -

          Here are some examples of student-written pseudocode. Based on our own pseudocode, let’s first display the entire solution.

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  arrange(desc(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 UA      15516377526
          - 2 DL      10532885801
          - 3 B6       9618222135
          - 4 AA       3677292231
          - 5 US       2533505829
          - 6 VX       2296680778
          - 7 EV       1817236275
          - 8 WN       1718116857
          - 9 9E        776970310
          -10 HA        642478122
          -11 AS        314104736
          -12 FL        219628520
          -13 F9        184832280
          -14 YV         20163632
          -15 MQ          7162420
          -16 OO          1299835
          -

          Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the distance variable from the flights data frame and the seats variable from the planes data frame, necessitating a join by the key variable tailnum as illustrated in Figure 5.7. To keep the resulting data frame easy to view, we’ll select() only these two variables and carrier:

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance)
          -
          # A tibble: 284,170 x 3
          -   carrier seats distance
          -   <chr>   <int>    <dbl>
          - 1 UA        149     1400
          - 2 UA        149     1416
          - 3 AA        178     1089
          - 4 B6        200     1576
          - 5 DL        178      762
          - 6 UA        191      719
          - 7 B6        200     1065
          - 8 EV         55      229
          - 9 B6        200      944
          -10 B6        200     1028
          -# ... with 284,160 more rows
          -

          Now for each flight we can compute the available seat miles ASM by multiplying the number of seats by the distance via a mutate():

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  # Added:
          -  mutate(ASM = seats * distance)
          -
          # A tibble: 284,170 x 4
          -   carrier seats distance    ASM
          -   <chr>   <int>    <dbl>  <dbl>
          - 1 UA        149     1400 208600
          - 2 UA        149     1416 210984
          - 3 AA        178     1089 193842
          - 4 B6        200     1576 315200
          - 5 DL        178      762 135636
          - 6 UA        191      719 137329
          - 7 B6        200     1065 213000
          - 8 EV         55      229  12595
          - 9 B6        200      944 188800
          -10 B6        200     1028 205600
          -# ... with 284,160 more rows
          -

          Next we want to sum the ASM for each carrier. We achieve this by first grouping by carrier and then summarizing using the sum() function:

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  # Added:
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 9E        776970310
          - 2 AA       3677292231
          - 3 AS        314104736
          - 4 B6       9618222135
          - 5 DL      10532885801
          - 6 EV       1817236275
          - 7 F9        184832280
          - 8 FL        219628520
          - 9 HA        642478122
          -10 MQ          7162420
          -11 OO          1299835
          -12 UA      15516377526
          -13 US       2533505829
          -14 VX       2296680778
          -15 WN       1718116857
          -16 YV         20163632
          -

          However, because for certain carriers certain flights have missing NA values, the resulting table also returns NA’s. We can eliminate these by adding a na.rm = TRUE argument to sum(), telling R that we want to remove the NA’s in the sum. We saw this in Section (summarize):

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  # Modified:
          -  summarize(ASM = sum(ASM, na.rm = TRUE))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 9E        776970310
          - 2 AA       3677292231
          - 3 AS        314104736
          - 4 B6       9618222135
          - 5 DL      10532885801
          - 6 EV       1817236275
          - 7 F9        184832280
          - 8 FL        219628520
          - 9 HA        642478122
          -10 MQ          7162420
          -11 OO          1299835
          -12 UA      15516377526
          -13 US       2533505829
          -14 VX       2296680778
          -15 WN       1718116857
          -16 YV         20163632
          -

          Finally, we arrange() the data in desc()ending order of ASM.

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  # Added:
          -  arrange(desc(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 UA      15516377526
          - 2 DL      10532885801
          - 3 B6       9618222135
          - 4 AA       3677292231
          - 5 US       2533505829
          - 6 VX       2296680778
          - 7 EV       1817236275
          - 8 WN       1718116857
          - 9 9E        776970310
          -10 HA        642478122
          -11 AS        314104736
          -12 FL        219628520
          -13 F9        184832280
          -14 YV         20163632
          -15 MQ          7162420
          -16 OO          1299835
          -

          While the above data frame is correct, the IATA carrier code is not always useful. For example, what carrier is WN? We can address this by joining with the airlines dataset using carrier is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  arrange(desc(ASM)) %>% 
          -  # Added:
          -  inner_join(airlines, by = "carrier")
          -
          # A tibble: 16 x 3
          -   carrier         ASM name                       
          -   <chr>         <dbl> <chr>                      
          - 1 UA      15516377526 United Air Lines Inc.      
          - 2 DL      10532885801 Delta Air Lines Inc.       
          - 3 B6       9618222135 JetBlue Airways            
          - 4 AA       3677292231 American Airlines Inc.     
          - 5 US       2533505829 US Airways Inc.            
          - 6 VX       2296680778 Virgin America             
          - 7 EV       1817236275 ExpressJet Airlines Inc.   
          - 8 WN       1718116857 Southwest Airlines Co.     
          - 9 9E        776970310 Endeavor Air Inc.          
          -10 HA        642478122 Hawaiian Airlines Inc.     
          -11 AS        314104736 Alaska Airlines Inc.       
          -12 FL        219628520 AirTran Airways Corporation
          -13 F9        184832280 Frontier Airlines Inc.     
          -14 YV         20163632 Mesa Airlines Inc.         
          -15 MQ          7162420 Envoy Air                  
          -16 OO          1299835 SkyWest Airlines Inc.      
          -
          - -
          -
          -

          Chihara, Laura M., and Tim C. Hesterberg. 2011. Mathematical Statistics with Resampling and R. Hoboken, NJ: John Wiley; Sons. https://sites.google.com/site/chiharahesterberg/home.

          -
          -
          -

          Diez, David M, Christopher D Barr, and Mine Çetinkaya-Rundel. 2014. Introductory Statistics with Randomization and Simulation. First Edition. https://www.openintro.org/stat/textbook.php?stat_book=isrs.

          -
          -
          -

          Grolemund, Garrett, and Hadley Wickham. 2016. R for Data Science. http://r4ds.had.co.nz/.

          -
          -
          -

          Ismay, Chester. 2016. Getting Used to R, RStudio, and R Markdown. http://ismayc.github.io/rbasics-book.

          -
          -
          -

          Kim, Albert Y., Chester Ismay, and Jennifer Chunn. 2018. Fivethirtyeight: Data and Code Behind the Stories and Interactives at ’Fivethirtyeight’. https://github.com/rudeboybert/fivethirtyeight.

          -
          -
          -

          Robbins, Naomi. 2013. Creating More Effective Graphs. Chart House.

          -
          -
          -

          Wickham, Hadley. 2014. “Tidy Data.” Journal of Statistical Software Volume 59 (Issue 10). https://www.jstatsoft.org/index.php/jss/article/view/v059i10/v59i10.pdf.

          -
          -
          -

          ———. 2018. Nycflights13: Flights That Departed Nyc in 2013. https://CRAN.R-project.org/package=nycflights13.

          -
          -
          -

          Wickham, Hadley, and Lionel Henry. 2018. Tidyr: Easily Tidy Data with ’Spread()’ and ’Gather()’ Functions. https://CRAN.R-project.org/package=tidyr.

          -
          -
          -

          Wickham, Hadley, Winston Chang, Lionel Henry, Thomas Lin Pedersen, Kohske Takahashi, Claus Wilke, and Kara Woo. 2018. Ggplot2: Create Elegant Data Visualisations Using the Grammar of Graphics.

          -
          -
          -

          Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2018. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.

          -
          -
          -

          Wilkinson, Leland. 2005. The Grammar of Graphics (Statistics and Computing). Secaucus, NJ, USA: Springer-Verlag New York, Inc.

          -
          -
          -

          Xie, Yihui. 2018. Bookdown: Authoring Books and Technical Documents with R Markdown. https://CRAN.R-project.org/package=bookdown.

          -
          -
          -
          -
          -
          - -
          -
          -
          - - -
          -
          - - - - - - - - - - - - - - diff --git a/docs/6-regression.html b/docs/6-regression.html index 7adfabec4..cc315f7b1 100644 --- a/docs/6-regression.html +++ b/docs/6-regression.html @@ -6,20 +6,20 @@ Chapter 6 Basic Regression | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        • 4.5 mutate existing variables
        • 4.6 arrange and sort rows
        • 4.7 join data frames
        • 4.8 Other verbs
        • -
          -
          -

          DataCamp

          -

          The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”.

          -
          - -
          +

          6.1 One numerical explanatory variable

          @@ -611,7 +598,7 @@

          6.1.1 Exploratory data analysis sample_n(5)

          @@ -1108,7 +1095,7 @@

          6.1.3 Observed/fitted values and

          For example, say we are interested in the 21st instructor in this dataset:

          -TABLE 6.1: Random sample of 5 instructors +TABLE 6.1: Random sample of 5 instructors
          @@ -1160,7 +1147,7 @@

          6.1.3 Observed/fitted values and regression_points

          -TABLE 6.3: Data for 21st instructor +TABLE 6.3: Data for 21st instructor
          @@ -1268,6 +1255,7 @@

          6.1.3 Observed/fitted values and
        • residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.
        • More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals.

          +
          @@ -1401,13 +1389,13 @@

          6.2.1 Exploratory data analysis6.2.1 Exploratory data analysis

          -

          We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code:

          +

          We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancy that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code:

          lifeExp_by_continent <- gapminder2007 %>%
             group_by(continent) %>%
             summarize(median = median(lifeExp), mean = mean(lifeExp))
          @@ -1505,8 +1493,8 @@

          6.2.1 Exploratory data analysis

          -TABLE 6.4: Regression points (for only 21st through 24th instructor) +TABLE 6.4: Regression points (for only 21st through 24th instructor)
          -

          We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all \(n = 142\) countries in 2007 was 71.935, the median life expectancy across the \(n =52\) countries in Africa was only 52.927.

          -

          Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.10, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world.

          +

          We see now that there are differences in life expectancy between the continents. For example let’s focus on only medians. While the median life expectancy across all \(n = 142\) countries in 2007 was 71.935, the median life expectancy across the \(n =52\) countries in Africa was only 52.927.

          +

          Let’s create a corresponding visualization. One way to compare the life expectancy of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.10, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world.

          ggplot(gapminder2007, aes(x = lifeExp)) +
             geom_histogram(binwidth = 5, color = "white") +
             labs(x = "Life expectancy", y = "Number of countries", 
          @@ -1518,7 +1506,7 @@ 

          6.2.1 Exploratory data analysis

          -

          Another way would be via a geom_boxplot where we map the categorical variable continent to the \(x\)-axis and the different life expectancies within each continent on the \(y\)-axis; we do this in Figure 6.11.

          +

          Another way would be via a geom_boxplot where we map the categorical variable continent to the \(x\)-axis and the different life expectancy within each continent on the \(y\)-axis; we do this in Figure 6.11.

          ggplot(gapminder2007, aes(x = continent, y = lifeExp)) +
             geom_boxplot() +
             labs(x = "Continent", y = "Life expectancy (years)", 
          @@ -1535,7 +1523,7 @@ 

          6.2.1 Exploratory data analysisAfrica and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).
        • Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.
        -

        Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparsion. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:

        +

        Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparison. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:

        1. The median life expectancy of the Americas is roughly 20 years greater.
        2. The median life expectancy of Asia is roughly 20 years greater.
        3. @@ -1826,7 +1814,7 @@

          6.2.2 Linear regression

          &= 54.8 \end{align}\]

          i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7.

          -

          Next, \(b_{\text{Amer}}\) = continentAmericas = 18.8 is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:

          +

          Next, \(b_{\text{Amer}}\) = continentAmericas = 18.8 is the difference in mean life expectancy of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:

          \[\begin{align} \widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) + b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ @@ -1837,7 +1825,7 @@

          6.2.2 Linear regression

          &= 72.9 \end{align}\]

          i.e. in this case, only the indicator function \(\mathbb{1}_{\mbox{Amer}}(x)\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7.

          -

          Similarly, \(b_{\text{Asia}}\) = continentAsia = 15.9 is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:

          +

          Similarly, \(b_{\text{Asia}}\) = continentAsia = 15.9 is the difference in mean life expectancy of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:

          \[\begin{align} \widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) + b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ @@ -1871,7 +1859,7 @@

          6.2.3 Observed/fitted values and

          What do fitted values \(\widehat{y}\) and residuals \(y - \widehat{y}\) correspond to when the explanatory variable \(x\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset:

          @@ -2043,7 +2031,7 @@

          6.2.3 Observed/fitted values and regression_points

          -TABLE 6.9: First 10 out of 142 countries +TABLE 6.9: First 10 out of 142 countries
          @@ -2242,6 +2230,7 @@

          6.2.3 Observed/fitted values and
        4. The fitted values lifeExp_hat \(\widehat{\text{lifeexp}}\). Countries in Africa have the same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life expectancy of Asia. This similarly holds for countries in the Americas, Europe, and Oceania.
        5. The residual column is simply \(y - \widehat{y}\) = lifeexp - lifeexp_hat. These values can be interpreted as that particular country’s deviation from the mean life expectancy of the respective continent’s mean. For example, the first row of this dataset corresponds to Afghanistan, and the residual of \(-26.9 = 43.8 - 70.7\) is Afghanistan’s mean life expectancy minus the mean life expectancy of all Asian countries.
        6. +

          6.4 Conclusion

          -

          In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.1 and 11.4.2. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!

          -
          -

          6.4.1 Script of R code

          +
          +

          6.4.1 Additional resources

          An R script file of all R code used in this chapter is available here.

          +
          +
          +

          6.4.2 What’s to come?

          +

          In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections 11.4.1 and 11.4.2. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!

          diff --git a/docs/7-multiple-regression.html b/docs/7-multiple-regression.html index b349e0da4..16fb36da4 100644 --- a/docs/7-multiple-regression.html +++ b/docs/7-multiple-regression.html @@ -6,20 +6,20 @@ Chapter 7 Multiple Regression | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        7. 4.5 mutate existing variables
        8. 4.6 arrange and sort rows
        9. 4.7 join data frames
        10. 4.8 Other verbs
        11. - -
          -

          DataCamp

          -

          The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.”

          -
          - -
          +

          7.1 Two numerical explanatory variables

          @@ -614,102 +601,102 @@

          7.1.1 Exploratory data analysis

          @@ -730,7 +717,7 @@

          7.1.1 Exploratory data analysis7.1.1 Exploratory data analysis7.1.1 Exploratory data analysis

          Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \(y\) was teaching score and a single numerical explanatory variable \(x\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click here to open an interactive plot of the regression plane shown below in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.

          -
          +
          Regression plane

          FIGURE 7.2: Regression plane @@ -1162,6 +1149,7 @@

          7.1.3 Observed/fitted values and
        12. Balance_hat corresponds to \(\widehat{y}\) (the fitted value)
        13. residual corresponds to \(y - \widehat{y}\) (the residual)
        14. +

          @@ -1266,15 +1254,15 @@

          7.2.1 Exploratory data analysis

          Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.

          @@ -1312,7 +1300,7 @@

          7.2.2 Multiple regression: Parall get_regression_table(score_model_2)

          -TABLE 6.10: Regression points (First 10 out of 142 countries) +TABLE 6.10: Regression points (First 10 out of 142 countries)
          -119 +250 -0 +98 -2161 +1551 -27.0 +22.6 -173 +134 -40 +43
          -41 +294 -50 +1677 -3327 +11200 -35.0 +140.7 -253 +817 -54 +46
          -308 +172 -0 +283 -3874 +4270 -75.4 +36.9 -298 +299 -41 +63
          -399 +41 -0 +50 -2525 +3327 -37.7 +35.0 -192 +253 -44 +54
          -296 +186 -0 +450 -1389 +4442 -27.3 +30.4 -149 +316 -67 +30
          @@ -1443,7 +1431,7 @@

          7.2.3 Multiple regression: Intera get_regression_table(score_model_interaction)

          -TABLE 7.6: Regression table +TABLE 7.6: Regression table
          @@ -1588,7 +1576,7 @@

          7.2.3 Multiple regression: Intera

          Let’s summarize these values in a table:

          -TABLE 7.7: Regression table +TABLE 7.7: Regression table
          @@ -1771,6 +1759,7 @@

          7.2.4 Observed/fitted values and
        15. score_hat corresponds to \(\widehat{y} = \widehat{\mbox{score}}\) the fitted value
        16. residual corresponds to the residual \(y - \widehat{y}\)
        17. +

          -TABLE 7.8: Comparison of male and female intercepts and age slopes +TABLE 7.8: Comparison of male and female intercepts and age slopes
          -

          Observe for each group we have their names, the number of red_balls they obtained, and the corresponding proportion out of 50 balls that were red prop_red. Observe, we also have a variable replicate enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red.

          -

          We visualize the distribution of these 33 proportions using a geom_histogram() with binwidth = 0.05 in Figure 8.7, which matches our hand-drawn histogram from the earlier Figure 8.6. Recall that using a histogram is appropriate since prop_red is a numerical variable.

          +

          Observe for each group we have their names, the number of red_balls they obtained, and the corresponding proportion out of 50 balls that were red named prop_red. Observe, we also have a variable replicate enumerating each of the 33 groups; we chose this name because each row can be viewed as one instance of a replicated activity: using the shovel to remove 50 balls and computing the proportion of those balls that are red.

          +

          We visualize the distribution of these 33 proportions using a geom_histogram() with binwidth = 0.05 in Figure 8.7, which is appropriate since the variable prop_red is numerical. This computer-generated histogram matches our hand-drawn histogram from the earlier Figure 8.6.

          ggplot(tactile_prop_red, aes(x = prop_red)) +
             geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
             labs(x = "Proportion of 50 balls that were red", 
          @@ -802,23 +791,28 @@ 

          8.1.3 Using shovel 33 times

          FIGURE 8.7: Distribution of 33 proportions based on 33 samples of size 50

          +

          8.1.4 What are we doing here?

          -

          What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl’s balls that are red. However, because the bowl has a very large number of balls, performing an exhaustive count of the number of red and white balls in the bowl would be very costly, both in terms of both time and energy. We therefore instead mix the balls and extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we approximate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red, 17 red balls out of 50 balls = 34% in our earlier example.

          -

          Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table 8.1. This is known as the concept of sampling variation.

          -

          In Section 8.2 we’ll mimic the hands-on sampling activity we just performed in a computer simulation; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the activity a very large number of times, but we will also be able to repeat it with different sized shovels.

          -

          After these simulations, in Section 8.3 we’ll explicitly articulate our goals for this chapter: understanding the concept of sampling variation and the role that sample size plays in this variation.

          -

          After having armed ourselves with this conceptual understanding of sampling, we’ll present you with definitions, terminology, and notation related to sampling in Section 8.4. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics.

          -

          To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section 8.6 we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity amongst young Americans, conducted by the Harvard Kennedy School’s Institute of Politics.

          -

          We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distiguishing between random sampling and random assignment, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.

          +

          What we just demonstrated in this activity is the statistical concept of sampling. We would like to know the proportion of the bowl’s balls that are red, but because the bowl has a very large number of balls performing an exhaustive count of the number of red and white balls in the bowl would be very costly in terms of both time and energy. We therefore extract a sample of 50 balls using the shovel. Using this sample of 50 balls, we estimate the proportion of the bowl’s balls that are red using the proportion of the shovel’s balls that are red. This estimate in our earlier example was 17 red balls out of 50 balls = 34%. Moreover, because we mixed the balls before each use of the shovel, the samples were randomly drawn. Because each sample was drawn at random, the samples were different from each other. Because the samples were different from each other, we obtained the different proportions red observed in Table 8.1. This is known as the concept of sampling variation.

          +

          In Section 8.2 we’ll mimic the hands-on sampling activity we just performed in a computer simulation; using a computer will allow us to repeat the above sampling activity much more than 33 times. Using a computer, not only will be able to repeat the hands-on activity a very large number of times, but we will also be able to repeat it using different sized shovels.

          +

          The purpose of these simulations is to develop an understanding of two key concepts relating to sampling: understanding the concept of sampling variation and the role that sample size plays in this variation. To this end, we’ll present you with definitions, terminology, and notation related to sampling in Section 8.3. As with many disciplines, there are definitions, terminology, and notation that seem very inaccessible and even confusing at first. However, as with many difficult topics, if you truly understand the underlying concepts and practice, practice, practice, you’ll be able to master these topics.

          +

          To tie the contents of this chapter to the real-word, we’ll present an example of one of the most recognizable uses of sampling: polls. In Section 8.4 we’ll look at a particular case study: a 2013 poll on then President Obama’s popularity among young Americans, conducted by the Harvard Kennedy School’s Institute of Politics.

          +

          We’ll close this chapter by generalizing the above sampling from the bowl activity to other scenarios, distinguishing between random sampling and random assignment, presenting the theoretical result underpinning all our results, and presenting a few mathematical formulas that relate to the concepts and ideas explored in this chapter.

          +

          8.2 Computer simulation

          -

          What we performed in Section 8.1 is a simulation of sampling. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”1 One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible.

          -

          Now you might be thinking that simulations must necssarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengeres of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer.

          +

          What we performed in Section 8.1 is a simulation of sampling. In other words, we were not in a real-life sampling scenario in order to answer a real-life question, but rather we were mimicking such a scenario with our bowl and shovel. The crowd-sourced Wikipedia definition of a simulation states: “A simulation is an approximate imitation of the operation of a process or system.”1 One example of simulations in practice are a flight simulators: before pilots in training are allowed to fly an actual plane, they first practice on a computer that attempts to mimic the reality of flying an actual plane as best as possible.

          +

          Now you might be thinking that simulations must necessarily take place on computer. However, this is not necessarily true. Take for example crash test dummies: before cars are made available to the market, automobile engineers test their safety by mimicking the reality for passengers of being in an automobile crash. To distinguish between these two simulation types, we’ll term a simulation performed in real-life as a “tactile” simulation done with your hands and to the touch as opposed to a “virtual” simulation performed on a computer.

          + @@ -833,13 +827,13 @@

          8.2 Computer simulation

          -

          So while in Section 8.1 we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a virtual bowl and a virtual shovel with our computers.

          +

          So while in Section 8.1 we performed a “tactile” simulation of sampling using an actual bowl and an actual shovel with our hands, in this section we’ll perform a “virtual” simulation using a “virtual” bowl and a “virtual” shovel with our computers.

          -
          -

          8.2.1 Using shovel once

          -

          Let’s start by perfoming the virtual analogue of the tactile sampling simulation we performed in 8.1. We first need a virtual analogue of the bowl seen in Figure 8.1. To this end, we created a data frame called bowl whose rows correspond exactly with the contents of the actual bowl; we’ve included this data frame in the moderndive package.

          +
          +

          8.2.1 Using the virtual shovel once

          +

          Let’s start by performing the virtual analogue of the tactile sampling simulation we performed in 8.1. We first need a virtual analogue of the bowl seen in Figure 8.1. To this end, we included a data frame bowl in the moderndive package whose rows correspond exactly with the contents of the actual bowl.

          bowl
          # A tibble: 2,400 x 2
              ball_ID color
          @@ -855,8 +849,9 @@ 

          8.2.1 Using shovel once

          9 9 red 10 10 white # … with 2,390 more rows
          -

          Observe in the output that bowl has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable ball_ID is used merely as an “identification variable” for this data frame as discussed in Subsection ??; none of the balls in the actual bowl are marked with numbers. The second variable color indicates whether a particular virtual ball i s red or white. Run View(bowl) in RStudio and scroll through the contents to convince yourselves that bowl is indeed a virtual version of the actual bowl in Figure 8.1.

          -

          Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure 8.2 to generate our random samples of 50 balls. We’re going to use the rep_sample_n() function included in the moderndive package that allows us to take repeated/replicated samples of sizen. Run the following and explorevirtual_shovel`’s contents in the spreadsheet viewer.

          + +

          Observe in the output that bowl has 2400 rows, telling us that the bowl contains 2400 equally-sized balls. The first variable ball_ID is used merely as an “identification variable” for this data frame as discussed in Subsection ??; none of the balls in the actual bowl are marked with numbers. The second variable color indicates whether a particular virtual ball is red or white. View the contents of the bowl in RStudio’s data viewer and scroll through the contents to convince yourselves that bowl is indeed a virtual version of the actual bowl in Figure 8.1.

          +

          Now that we have a virtual analogue of our bowl, we now need a virtual analogue for the shovel seen in Figure 8.2; we’ll use this virtual shovel to generate our virtual random samples of 50 balls. We’re going to use the rep_sample_n() function included in the moderndive package. This function allows us to take repeated, or replicated, samples of size n. Run the following and explore virtual_shovel’s contents in the RStudio viewer.

          virtual_shovel <- bowl %>% 
             rep_sample_n(size = 50)
           View(virtual_shovel)
          @@ -991,10 +986,10 @@

          8.2.1 Using shovel once

          -

          The ball_ID variable identifies which of balls from bowl are included in our sample of 50 balls and color denotes it’s color. However what does the replicate variable indicate? In virtual_shovel’s case, replicate is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in other words our first sample. We’ll see below when we “virtually” take 33 samples below, replicate will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red. We’ll be using the dplyr data wrangling verbs you learned in Chapter 4. Let’s breakdown the steps individually:

          -

          First, for each of our 50 sampled balls, identify if it is red or not using the boolean algebra. For every row where color == "red", the boolean TRUE is returned and for every row where color is not equal to "red", the boolean FALSE is returned. Let’s create a new boolean variable is_red using the mutate() function from Section 4.5:

          +

          The ball_ID variable identifies which of the balls from bowl are included in our sample of 50 balls and color denotes its color. However what does the replicate variable indicate? In virtual_shovel’s case, replicate is equal to 1 for all 50 rows. This is telling us that these 50 rows correspond to a first repeated/replicated use of the shovel, in our case our first sample. We’ll see below when we “virtually” take 33 samples, replicate will take values between 1 and 33. Before we do this, let’s compute the proportion of balls in our virtual sample of size 50 that are red using the dplyr data wrangling verbs you learned in Chapter 4. Let’s breakdown the steps individually:

          +

          First, for each of our 50 sampled balls, identify if it is red using a test for equality using ==. For every row where color == "red", the Boolean TRUE is returned and for every row where color is not equal to "red", the Boolean FALSE is returned. Let’s create a new Boolean variable is_red using the mutate() function from Section 4.5:

          virtual_shovel %>% 
          -  mutate(is_red = color == "red")
          + mutate(is_red = (color == "red"))
          # A tibble: 50 x 4
           # Groups:   replicate [1]
              replicate ball_ID color is_red
          @@ -1012,13 +1007,13 @@ 

          8.2.1 Using shovel once

          # … with 40 more rows

          Second, we compute the number of balls out of 50 that are red using the summarize() function. Recall from Section 4.3 that summarize() takes a data frame with many rows and returns a data frame with a single row containing summary statistics that you specify, like mean() and median(). In this case we use the sum():

          virtual_shovel %>% 
          -  mutate(is_red = color == "red") %>% 
          +  mutate(is_red = (color == "red")) %>% 
             summarize(num_red = sum(is_red))  
          # A tibble: 1 x 2
             replicate num_red
                 <int>   <int>
           1         1      17
          -

          Why does this work? Because R treats TRUE like the number 1 and FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s, which in the end which counts the number of balls where color is red.

          +

          Why does this work? Because R treats TRUE like the number 1 and FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s, which in the end counts the number of balls where color is red. In our case, 17 of the 50 balls were red.

          Third and last, we compute the proportion of the 50 sampled balls that are red by dividing num_red by 50:

          virtual_shovel %>% 
             mutate(is_red = color == "red") %>% 
          @@ -1028,7 +1023,7 @@ 

          8.2.1 Using shovel once

          replicate num_red prop_red <int> <int> <dbl> 1 1 17 0.34
          -

          Let’s make the above code a little more compact and succinct by combining the first mutate() and the summarize() as follows:

          +

          In other words, this “virtual” sample’s balls were 34% red. Let’s make the above code a little more compact and succinct by combining the first mutate() and the summarize() as follows:

          virtual_shovel %>% 
             summarize(num_red = sum(color == "red")) %>% 
             mutate(prop_red = num_red / 50)
          @@ -1036,24 +1031,23 @@

          8.2.1 Using shovel once

          replicate num_red prop_red <int> <int> <dbl> 1 1 17 0.34
          -

          Great! 44% of virtual_shovel’s 50 balls were red! So based on this particular sample, our guess at the proportion of bowl’s balls that are red is 44%. But remember from our earlier tactile sampling activity, that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 44% of them being red; there will likely be some variation.

          -

          In fact in Table 8.2 we displayed 33 such proportions based on 33 tactile samples and then in Figure 8.6 we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel!

          +

          Great! 34% of virtual_shovel’s 50 balls were red! So based on this particular sample, our guess at the proportion of the bowl’s balls that are red is 34%. But remember from our earlier tactile sampling activity that if we repeated this sampling, we would not necessarily obtain a sample of 50 balls with 34% of them being red again; there will likely be some variation. In fact in Table 8.2 we displayed 33 such proportions based on 33 tactile samples and then in Figure 8.6 we visualized the distribution of the 33 proportions in a histogram. Let’s now perform the virtual analogue of having 33 groups of students use the sampling shovel!

          -
          -

          8.2.2 Using shovel 33 times

          -

          Recall however in our tactile sampling exercise in Section 8.1 above that we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we used to then compute 33 proportions. In other words we repeated/replicated the sampling activity 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel funciton rep_sample_n(), but by adding the reps = 33 argument indicating we want to repeat the sampling 33 times.

          -

          Be sure to scroll through the contents of virtual_samples in RStudio’s spreadsheet viewer.

          +
          +

          8.2.2 Using the virtual shovel 33 times

          +

          Recall that in our tactile sampling exercise in Section 8.1 we had 33 groups of students each use the shovel, yielding 33 samples of size 50 balls, which we then used to compute 33 proportions. In other words we repeated/replicated using the shovel 33 times. We can perform this repeated/replicated sampling virtually by once again using our virtual shovel function rep_sample_n(), but by adding the reps = 33 argument, indicating we want to repeat the sampling 33 times. Be sure to scroll through the contents of virtual_samples in RStudio’s viewer.

          virtual_samples <- bowl %>% 
             rep_sample_n(size = 50, reps = 33)
           View(virtual_samples)
          -

          Observe that while the first 50 rows of replicate are equal to 1 the next 50 are equal to 2. This is indicating that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all reps = 33 replicates and thus virtual_samples has 33 \(\times\) 50 = 1650 rows.

          -

          Let’s now take the data frame virtual_samples with 33 \(\times\) 50 = 1650 rows corresponding to 33 samples of size 50 and compute the resulting 33 proportions red. We’ll use the same dplyr verbs as we did in the previous section, but this time with an additional group_by() the replicate variable. Recall from Section 4.4 that by assigning grouping “meta-data” before summarizing(), we’ll obtain 33 different proportions red:

          +

          Observe that while the first 50 rows of replicate are equal to 1, the next 50 rows of replicate are equal to 2. This is telling us that the first 50 rows correspond to the first sample of 50 balls while the next 50 correspond to the second sample of 50 balls. This pattern continues for all reps = 33 replicates and thus virtual_samples has 33 \(\times\) 50 = 1650 rows.

          +

          Let’s now take the data frame virtual_samples with 33 \(\times\) 50 = 1650 rows corresponding to 33 samples of size 50 balls and compute the resulting 33 proportions red. We’ll use the same dplyr verbs as we did in the previous section, but this time with an additional group_by() of the replicate variable. Recall from Section 4.4 that by assigning the grouping variable “meta-data” before summarizing(), we’ll obtain 33 different proportions red:

          virtual_prop_red <- virtual_samples %>% 
             group_by(replicate) %>% 
             summarize(red = sum(color == "red")) %>% 
             mutate(prop_red = red / 50)
           View(virtual_prop_red)
          -

          Let’s display only the first 10 out of 33 rows of virtual_prop_red’s contents in Table 8.1.

          +

          Let’s display only the first 10 out of 33 rows of virtual_prop_red’s contents in Table 8.1. As one would expect, there is variation in the resulting prop_red proportions red for the first 10 out 33 repeated/replicated samples.

          +
          TABLE 8.3: First 10 out of 33 virtual proportion of 50 balls that are red. @@ -1195,29 +1189,29 @@

          8.2.2 Using shovel 33 times

          FIGURE 8.8: Distribution of 33 proportions based on 33 samples of size 50

          -

          Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation.

          -

          Let’s now compare our virtual results with our tactile results from the previous section in Figure 8.9. We see that both histograms, in other words the distribution of the 33 proportions red, are somewhat somewhat similar in their center and spread, although not identical; these slight differences are again due to random variation. Furthermore both distributions are somewhat bell-shaped.

          +

          Observe that occasionally we obtained proportions red that are less than 0.3 = 30%, while on the other hand we occasionally we obtained proportions that are greater than 0.45 = 45%. However, the most frequently occurring proportions red out of 50 balls were between 35% and 40% (for 11 out 33 samples). Why do we have these differences in proportions red? Because of sampling variation.

          +

          Let’s now compare our virtual results with our tactile results from the previous section in Figure 8.9. We see that both histograms, in other words the distribution of the 33 proportions red, are somewhat similar in their center and spread although not identical. These slight differences are again due to random variation. Furthermore both distributions are somewhat bell-shaped.

          -Two distribution of 33 proportions based on 33 samples of size 50 +Comparing 33 virtual and 33 tactile proportions red.

          -FIGURE 8.9: Two distribution of 33 proportions based on 33 samples of size 50 +FIGURE 8.9: Comparing 33 virtual and 33 tactile proportions red.

          -
          -

          8.2.3 Using shovel 1000 times

          -

          Now say we want study the variation in proportions red not based on 33 samples but rather a very large number of samples, say 1000 samples. We have two choices at this point. We could make our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. However, this would be cruel and unusual, as it this would be very tedious and time consuming. This is however where computers excel: for automating long and repetitive tasks and having them performed very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the rep_sample_n() function with sample size set to 50, but the number of replicates reps = 1000.

          -

          Be sure to scroll through the contents of virtual_samples in RStudio’s spreadsheet viewer.

          +
          +

          8.2.3 Using the virtual shovel 1000 times

          +

          Now say we want study the variation in proportions red not based on 33 repeated/replicated samples, but rather a very large number of samples say 1000 samples. We have two choices at this point. We could have our students manually take 1000 samples of 50 balls and compute the corresponding 1000 proportion red out 50 balls. This would be cruel and unusual however, as this would be very tedious and time-consuming. This is where computers excel: automating long and repetitive tasks while performing them very quickly. Therefore at this point we will abandon tactile sampling in favor of only virtual sampling. Let’s once again use the rep_sample_n() function with sample size set to 50 once again, but this time with the number of replicates reps = 1000. Be sure to scroll through the contents of virtual_samples in RStudio’s viewer.

          virtual_samples <- bowl %>% 
             rep_sample_n(size = 50, reps = 1000)
           View(virtual_samples)
          -

          Observe that now virtual_samples has 1000 \(\times\) 50 = 50,000 rows, instead of the 33 \(\times\) 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame virtual_samples with 1000 \(\times\) 50 = 50,000 and compute the resulting 33 proportions red.

          +

          Observe that now virtual_samples has 1000 \(\times\) 50 = 50,000 rows, instead of the 33 \(\times\) 50 = 1650 rows from earlier. Using the same code as earlier, let’s take the data frame virtual_samples with 1000 \(\times\) 50 = 50,000 and compute the resulting 1000 proportions red.

          virtual_prop_red <- virtual_samples %>% 
             group_by(replicate) %>% 
             summarize(red = sum(color == "red")) %>% 
             mutate(prop_red = red / 50)
           View(virtual_prop_red)

          Observe that we now have 1000 replicates of prop_red, the proportion of 50 balls that are red. Using the same code as earlier, let’s now visualize the distribution of these 1000 replicates of prop_red in a histogram in Figure 8.10.

          +
          ggplot(virtual_prop_red, aes(x = prop_red)) +
             geom_histogram(binwidth = 0.05, boundary = 0.4, color = "white") +
             labs(x = "Proportion of 50 balls that were red", 
          @@ -1228,11 +1222,12 @@ 

          8.2.3 Using shovel 1000 times

          -

          Once again, the most frequently occuring proportions red occur between 35% and 40%. Every now and then, we’d obtain proportions are low as between 20% and 25%, and others as high as between 55% and 60%, but those are rarities. Furthermore observe that we now have much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix A for a brief discussion on properties of the Normal distribution.

          +

          Once again, the most frequently occurring proportions red occur between 35% and 40%. Every now and then, we obtain proportions as low as between 20% and 25%, and others as high as between 55% and 60%. These are rare however. Furthermore observe that we now have a much more symmetric and smoother bell-shaped distribution. This distribution is in fact a Normal distribution; see Appendix A for a brief discussion on properties of the Normal distribution.

          +

          8.2.4 Using different shovels

          -

          We ask ourselves a question now. Say you had three choices of shovels to extract a sample of balls and compute the corresponding proportion of balls in the shovel that are red:

          +

          Now say instead of just one shovel, you had three choices of shovels to extract a sample of balls with.

          @@ -1249,7 +1244,7 @@

          8.2.4 Using different shovels

          -

          Which would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size, and thus would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. The three shovels above present with three possible sample sizes. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, for size = 25, size = 50, and size = 100:

          +

          If your goal was still to estimate the proportion of the bowl’s balls that were red, which shovel would you choose? In our experience, most people would choose the shovel with 100 slots since it has the biggest sample size and hence would yield the “best” guess of the proportion of the bowl’s 2400 balls that are red. Using our newly developed tools for virtual sampling simulations, let’s unpack the effect of having different sample sizes! In other words, let’s use rep_sample_n() with size = 25, size = 50, and size = 100, while keeping the number of repeated/replicated samples at 1000:

          1. Virtually use the appropriate shovel to generate 1000 samples with size balls.
          2. Compute the resulting 1000 replicated of the proportion of the shovel’s balls that are red.
          3. @@ -1310,8 +1305,8 @@

            8.2.4 Using different shovels

          -

          Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation, and the distribution centers more tightly around the same value. Eyeballing Figure 8.11, things appear to center more tightly around roughly 40%.

          -

          We can be numerically explicit about the amount of spread using the standard deviation: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix A for a brief discussion on properties of the standard deviation. For all three sample sizes, compute the standard deviation of sd() of the 1000 proportions red by running the following data wrangling code.

          +

          Observe that as the sample size increases, the spread of the 1000 replicates of the proportion red decreases. In other words, as the sample size increases, there are less differences due to sampling variation and the distribution centers more tightly around the same value. Eyeballing Figure 8.11, things appear to center tightly around roughly 40%.

          +

          We can be numerically explicit about the amount of spread in our 3 sets of 1000 values of prop_red using the standard deviation: a summary statistic that measures the amount of spread and variation within a numerical variable; see Appendix A for a brief discussion on properties of the standard deviation. For all three sample sizes, let’s compute the standard deviation of the 1000 proportions red by running the following data wrangling code that uses the sd() summary function.

          # n = 25
           virtual_prop_red_25 %>% 
             summarize(sd = sd(prop_red))
          @@ -1323,18 +1318,18 @@ 

          8.2.4 Using different shovels

          # n = 100 virtual_prop_red_100 %>% summarize(sd = sd(prop_red))
          -

          Let’s compare these 3 measures of spread of the distributions we in Table 8.4.

          +

          Let’s compare these 3 measures of spread of the distributions in Table 8.4.

          @@ -1369,288 +1364,295 @@

          8.2.4 Using different shovels

          -
          -

          8.3 Our goal

          -

          Simply put: study the effects of sampling variation

          -
          -

          8.3.1 What is sampling variation?

          -
          -
          -

          8.3.2 Effect of sample size

          -
          -
          -
          -

          8.4 Sampling framework

          -
          -

          8.4.1 Terminology

          -

          Let’s now define some concepts and terminology important to understand sampling, being sure to tie things back to the above example. You might have to read this a couple times more as you progress throughout this book, as they are very deeply layered concepts. However as we’ll soon see, they are very powerful concepts that open up a whole new world of scientific thinking:

          +

          8.3 Sampling framework

          +

          In both our “hands-on” tactile simulations and our “virtual” simulations using a computer, we used sampling for the purpose of estimation: we extract samples in order to estimate the proportion of the bowl’s balls that are red. We used sampling as a cheaper and less-time consuming approach than to do a full census of all the balls. Our virtual simulations all built up to the results shown in Figure 8.11 and Table 8.4, comparing 1000 proportions red based on samples of size 25, 50, and 100. This was our first attempt at understanding two key concepts relating to sampling for estimation:

            -
          1. Population: The population is a set of \(N\) observations of interest. -
              -
            • Above Ex: Our bowl consisting of \(N=2400\) identically-shaped balls.
            • -
          2. -
          3. Population parameter: A population parameter is a numerical summary value about the population. In most settings, this is a value that’s unknown and you wish you knew it. -
              -
            • Above Ex: The true population proportion \(p\) of the balls in the bowl that are red.
            • -
            • In this scenario the parameter of interest is the proportion, but in others it could be numerical summary values like the mean, median, etc.
            • -
          4. -
          5. Census: An exhaustive enumeration/counting of all observations in the population in order to compute the population parameter’s numerical value exactly. -
              -
            • Above Ex: This corresponds to manually going over all \(N=2400\) balls and counting the number that are red, thereby allowing us to compute the population proportion \(p\) of the balls that are red exactly.
            • -
            • When \(N\) is small, a census is feasible. However, when \(N\) is large, a census can get very expensive, either in terms of time, energy, or money.
            • -
            • Ex: the Decennial United States census attempts to exhaustively count the US population. Consequently it is a very expensive, but necessary, procedure.
            • -
          6. -
          7. Sampling: Collecting a sample of size \(n\) of observations from the population. Typically the sample size \(n\) is much smaller than the population size \(N\), thereby making sampling a much cheaper procedure than a census. -
              -
            • Above Ex: Using the shovel to extract a sample of \(n=50\) balls.
            • -
            • It is important to remember that the lowercase \(n\) corresponds to the sample size and uppercase \(N\) corresponds to the population size, thus \(n \leq N\).
            • -
          8. -
          9. Point estimates/sample statistics: A summary statistic based on the sample of size \(n\) that estimates the unknown population parameter. -
              -
            • Above Ex: it’s the sample proportion \(\widehat{p}\) red of the balls in the sample of size \(n=50\).
            • -
            • Key: The sample proportion red \(\widehat{p}\) is an estimate of the true unknown population proportion red \(p\).
            • -
          10. -
          11. Representative sampling: A sample is said be a representative sample if it “looks like the population.” In other words, the sample’s characteristics are a good representation of the population’s characteristics. -
              -
            • Above Ex: Does our sample of \(n=50\) balls “look like” the contents of the larger set of \(N=2400\) balls in the bowl?
            • -
          12. -
          13. Generalizability: We say a sample is generalizable if any results of based on the sample can generalize to the population. -
              -
            • Above Ex: Is \(\widehat{p}\) a “good guess” of \(p\)?
            • -
            • In other words, can we infer about the true proportion of the balls in the bowl that are red, based on the results of our sample of \(n=50\) balls?
            • -
          14. -
          15. Bias: In a statistical sense, we say bias occurs if certain observations in a population have a higher chance of being sampled than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. -
              -
            • Above Ex: Did each ball, irrespective of color, have an equal chance of being sampled, meaning the sampling was unbiased? We feel since the balls are all of the same size, there isn’t any bias in the sampling. If, say, the red balls had a much larger diameter than the white ones then you might have have a higher or lower probability of now sampling red balls.
            • -
          16. -
          17. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. -
              -
            • Above Ex: As long as you mixed the bowl sufficiently before sampling, your samples of size \(n=50\) balls would be random.
            • -
          18. +
          19. The effect of sampling variation on our estimates.
          20. +
          21. The effect of sample size on sampling variation.
          -
          -
          -

          8.4.2 Sampling for inference

          -

          Why did we go through the trouble of enumerating all the above concepts and terminology?

          -

          The moral of the story:

          +

          Let’s now introduce some terminology and notation as well as statistical definitions related to sampling. Given the number of new words to learn, you will likely have to read these next three subsections multiple times. Keep in mind however that none of the concepts underlying these terminology, notation, and definitions are any different than the concepts underlying our simulations in Sections 8.1 and 8.2; it will simply take time and practice to master them.

          +
          +

          8.3.1 Terminology & notation

          +

          Here is a list of terminology and mathematical notation relating to sampling. For each item, we’ll be sure to tie them to our simulations in Sections 8.1 and 8.2.

          +
            +
          1. (Study) Population: A (study) population is a collection of individuals or observations about which we are interested. We mathematically denote the population’s size using upper case \(N\). In our simulations the (study) population was the collection of \(N\) = 2400 identically sized red and white balls contained in the bowl.
          2. +
          3. Population parameter: A population parameter is a numerical summary quantity about the population that is unknown, but you wish you knew. For example, when this quantity is a mean, the population parameter of interest is the population mean which is mathematically denoted with the Greek letter \(\mu\) (pronounced “mu”). In our simulations however since we were interested in the proportion of the bowl’s balls that were red, the population parameter is the population proportion which is mathematically denoted with the letter \(p\).
          4. +
          5. Census: An exhaustive enumeration or counting of all \(N\) individuals or observations in the population in order to compute the population parameter’s value exactly. In our simulations, this would correspond to manually going over all \(N\) = 2400 balls in the bowl and counting the number that are red and computing the population proportion \(p\) of the balls that are red exactly. When the number \(N\) of individuals or observations in our population is large, as was the case with our bowl, a census can be very expensive in terms of time, energy, and money.
          6. +
          7. Sampling: Sampling is the act of collecting a sample from the population when we don’t have the means to perform a census. We mathematically denote the sample’s size using lower case \(n\), as opposed to upper case \(N\) which denotes the population’s size. Typically the sample size \(n\) is much smaller than the population size \(N\), thereby making sampling a much cheaper procedure than a census. In our simulations, we used shovels with 25, 50, and 100 slots to extract a sample of size \(n\) = 25, \(n\) = 50, and \(n\) = 100 balls.
          8. +
          9. Point estimate (AKA sample statistic): A summary statistic computed from the sample that estimates the unknown population parameter. In our simulations, recall that the unknown population parameter was the population proportion and that this is mathematically denoted with \(p\). Our point estimate is the sample proportion: the proportion of the shovel’s balls that are red. In other words, it is our guess of the proportion of the bowl’s balls balls that are red. We mathematically denote the sample proportion using \(\widehat{p}\); the “hat” on top of the \(p\) indicates that it is an estimate of the unknown population proportion \(p\).
          10. +
          11. Representative sampling: A sample is said be a representative sample if it is representative of the population. In other words, are the sample’s characteristics a good representation of the population’s characteristics? In our simulations, are the samples of \(n\) balls extracted using our shovels representative of the bowl’s \(N\)=2400 balls?
          12. +
          13. Generalizability: We say a sample is generalizable if any results based on the sample can generalize to the population. In other words, can the value of the point estimate be generalized to estimate the value of the population parameter well? In our simulations, can we generalize the values of the sample proportions red of our shovels to the population proportion red of the bowl? Using mathematical notation, is \(\widehat{p}\) a “good guess” of \(p\)?
          14. +
          15. Bias: In a statistical sense, we say bias occurs if certain individuals or observations in a population have a higher chance of being included in a sample than others. We say a sampling procedure is unbiased if every observation in a population had an equal chance of being sampled. In our simulations, since each ball had the same size and hence an equal chance of being sample in our shovels, our samples were unbiased.
          16. +
          17. Random sampling: We say a sampling procedure is random if we sample randomly from the population in an unbiased fashion. In our simulations, this would correspond to sufficiently mixing the bowl before each use of the shovel.
          18. +
          +

          Phew, that’s a lot of new terminology and notation to learn! Let’s put them all together to describe the paradigm of sampling:

          • If the sampling of a sample of size \(n\) is done at random, then
          • -
          • The sample is unbiased and representative of the population, thus
          • -
          • Any result based on the sample can generalize to the population, thus
          • -
          • The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest
          • +
          • the sample is unbiased and representative of the population of size \(N\), thus
          • +
          • any result based on the sample can generalize to the population, thus
          • +
          • the point estimate is a “good guess” of the unknown population parameter, thus
          • +
          • instead of performing a census, we can infer about the population using sampling.
          -

          and thus we have inferred about the population based on our sample. In the above example:

          +

          Restricting consideration to a shovel with 50 slots from our simulations,

            -
          • If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \(n=50\), then
          • -
          • The contents of the shovel will “look like” the contents of the bowl, thus
          • -
          • Any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, thus
          • -
          • The sample proportion \(\widehat{p}\) of the \(n=50\) balls in the shovel that are red is a “good guess” of the true population proportion \(p\) of the \(N=2400\) balls that are red.
          • +
          • If we extract a sample of \(n=50\) balls at random, in other words we mix the equally-sized balls before using the shovel, then
          • +
          • the contents of the shovel are an unbiased representation of the contents of the bowl’s 2400 balls, thus
          • +
          • any result based on the sample of balls can generalize to the bowl, thus
          • +
          • the sample proportion \(\widehat{p}\) of the \(n=50\) balls in the shovel that are red is a “good guess” of the population proportion \(p\) of the \(N\)=2400 balls that are red, thus
          • +
          • instead of manually going over all the balls in the bowl, we can infer about the bowl using the shovel.
          -

          and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.

          +

          Note that last word we wrote in bold: infer. The act of “inferring” is to deduce or conclude (information) from evidence and reasoning. In our simulations, we wanted to infer about the proportion of the bowl’s balls that are red. Statistical inference is the theory, methods, and practice of forming judgments about the parameters of a population and the reliability of statistical relationships, typically on the basis of random sampling (Wikipedia). In other words, statistical inference is the act of inference via sampling. In the upcoming Chapter 9 on confidence intervals, we’ll introduce the infer package, which makes statistical inference “tidy” and transparent. It is why this third portion of the book is called “Statistical inference via infer”.

          -

          8.4.3 Statistical definitions

          -

          Sampling distributions are a specific kind of distribution: distributions of point estimates/sample statistics based on samples of size \(n\) used to estimate an unknown population parameter.

          -

          In the case of the histogram in Figure 8.7, its the distribution of the sample proportion red \(\widehat{p}\) based on \(n=50\) sampled balls from the bowl, for which we want to estimate the unknown population proportion \(p\) of the \(N=2400\) balls that are red. Sampling distributions describe how values of the sample proportion red \(\widehat{p}\) will vary from sample to sample due to sampling variability and thus identify “typical” and “atypical” values of \(\widehat{p}\). For example

          -
            -
          • Obtaining a sample that yields \(\widehat{p} = 0.36\) would be considered typical, common, and plausible since it would in theory occur frequently.
          • -
          • Obtaining a sample that yields \(\widehat{p} = 0.8\) would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.
          • -
          -

          Let’s now ask ourselves the following questions:

          -
            -
          1. Where is the sampling distribution centered?
          2. -
          3. What is the spread of this sampling distribution?
          4. -
          -

          Recall from Section 4.3 the mean and the standard deviation are two summary statistics that would answer this question:

          -
          tactile_prop_red %>% 
          -  summarize(mean = mean(prop_red), sd = sd(prop_red))
          -
          -TABLE 8.4: Comparing the standard deviations of the proportion red for different sample sizes. +TABLE 8.4: Comparing standard deviations of proportions red for 3 different shovels.
          -sample size +Number of slots in shovel -standard deviation +Standard deviation of proportions red
          +

          8.3.2 Statistical definitions

          +

          Now for some important statistical definitions related to sampling. As a refresher of our 1000 repeated/replicated virtual samples of size \(n\) = 25, \(n\) = 50, and \(n\) = 100 in Section 8.2, let’s display Figure 8.11 again below.

          +

          +

          These types of distributions have a special name: sampling distributions; their visualization displays the effect of sampling variation on the distribution of any point estimate, in this case the sample proportion \(\widehat{p}\). Using these sampling distributions, for a given sample size \(n\), we can make statements about what values we can typically expect. For example, observe the centers of all three sampling distributions: they are all roughly centered around 0.4 = 40%. Furthermore, observe that while we are somewhat likely to observe sample proportions red of 0.2 = 20% when using the shovel with 25 slots, we will almost never observe this sample proportion when using the shovel with 100 slots. Observe also the effect of sample size on the sampling variation. As the sample size \(n\) increases from 25 to 50 to 100, the spread/variation of the sampling distribution decreases and thus the values cluster more and more tightly around the same center of around 40%. We quantified this spread/variation using the standard deviation of our proportions in Table 8.4, which we display again below:

          +
          + + + + + + + +
          -mean +Number of slots in shovel -sd +Standard deviation of proportions red
          -0.356 +25 + +0.099 +
          +50 + +0.071 +
          +100 -0.058 +0.048
          -

          Finally, it’s important to keep in mind:

          -
            -
          1. If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red \(p\), or in other words the true number of balls out of 2400 that are red.
          2. -
          3. The spread of this histogram, as quantified by the standard deviation of 0.058, is called the standard error. It quantifies the uncertainty of our estimates of \(p\), which recall are called \(\widehat{p}\). -
              -
            • Note: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.
            • -
          4. -
          -
            -
          • sampling distribution
          • -
          • standard error
          • -
          - -

          Now let’s mimic the above tactile sampling, but with virtual sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!

          -

          In Figure 8.8, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion \(\widehat{p}\) might not be enough to get a true sense of the distribution. Using 1000 values of \(\widehat{p}\) would definitely give a better sense. What are our two options for constructing these histograms?

          -
            -
          1. Tactile sampling: Make the 33 groups of students take \(1000 / 33 \approx 31\) samples of size \(n=50\) each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion \(\widehat{p}\). However, this would be cruel and unusual as this would take hours!
          2. -
          3. Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!
          4. -
          -

          First, generate 1000 samples of size \(n=50\)

          -
          virtual_samples <- bowl %>% 
          -  rep_sample_n(size = 50, reps = 1000)
          -View(virtual_samples)
          -

          Then for each of these 1000 samples of size \(n=50\), compute the corresponding sample proportions

          -
          virtual_prop_red <- virtual_samples %>% 
          -  group_by(replicate) %>% 
          -  summarize(red = sum(color == "red")) %>% 
          -  mutate(prop_red = red / 50)
          -View(virtual_prop_red)
          -

          As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red \(\widehat{p}\) with a histogram in Figure 8.10.

          -
          ggplot(virtual_prop_red, aes(x = prop_red)) +
          -  geom_histogram(binwidth = 0.05, color = "white") +
          -  labs(x = "Sample proportion red based on n = 50", 
          -       title = "Sampling distribution of p-hat") 
          -
          -Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 +

          So as the number of slots in the shovel increased, this standard deviation decreased. These types of standard deviations have another special name: standard errors; they quantify the effect of sampling variation induced on our estimates. In other words, they are quantifying how much we can expect different proportions of a shovel’s balls that are red to vary from random sample to random sample.

          +

          Unfortunately, many new statistics practitioners get confused by these names. For example, it’s common for people new to statistical inference to call the “sampling distribution” the “sample distribution”. Another additional source of confusion is the name “standard deviation” and “standard error”. Remember that a standard error is merely a kind of standard deviation: the standard deviation of any point estimate from a sampling scenario. In other words, all standard errors are standard deviations, but not all standard deviations are a standard error.

          +

          To help reinforce these concepts, let’s re-display Figure 8.11 but using our new terminology, notation, and definitions relating to sampling in Figure 8.12.

          +
          +Three sampling distributions of the sample proportion $\widehat{p}$.

          -FIGURE 8.12: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 +FIGURE 8.12: Three sampling distributions of the sample proportion \(\widehat{p}\).

          -

          Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red \(p\) of all \(N=2400\) balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375.

          -

          What is the standard error of the above sampling distribution of \(\widehat{p}\) based on 1000 samples of size \(n=50\)?

          -
          virtual_prop_red %>% 
          -  summarize(SE = sd(prop_red))
          -
          # A tibble: 1 x 1
          -      SE
          -   <dbl>
          -1 0.0702
          -

          What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes \(n\). We’ll do so for samples of size \(n=25\), \(n=50\), and \(n=100\) next.

          -
          +

          Furthermore, let’s re-display Table 8.4 but using our new terminology, notation, and definitions relating to sampling in Table 8.5.

          + + + + + + + + + + + + + + + + + + + + + + +
          +TABLE 8.5: Three standard errors of the sample proportion \(\widehat{p}\) based on n = 25, 50, 100. +
          +Sample size + +Standard error of \(\widehat{p}\) +
          +n = 25 + +0.099 +
          +n = 50 + +0.071 +
          +n = 100 + +0.048 +
          + +

          Remember the key message of this last table: that as the sample size \(n\) goes up, the “typical” error of your point estimate as quantified by the standard error will go down.

          +
          +

          8.3.3 The moral of the story

          +

          Let’s recap this section so far. We’ve seen that if a sample is generated at random, then the resulting point estimate is a “good guess” of the true unknown population parameter. In our simulations, since we made sure to mix the balls first before extracting a sample with the shovel, the resulting sample proportion \(\widehat{p}\) of the shovel’s balls that were red was a “good guess” of the population proportion \(p\) of the bowl’s balls that were red.

          +

          However, what do we mean by our point estimate being a “good guess”? While sometimes we’ll obtain a point estimate less than the true value of the unknown population parameter, other times we’ll obtain a point estimate greater than the true value of the unknown population parameter, this is because of sampling variation. However despite this sampling variation, our point estimates will “on average” be correct. In our simulations, sometimes our sample proportion \(\widehat{p}\) was less than the true population proportion \(p\), other times the sample proportion \(\widehat{p}\) was greater than the true population proportion \(p\). This was due to the sampling variability induced by the mixing. However despite this sampling variation, our sample proportions \(\widehat{p}\) were always centered around the true population proportion. This is also known as having an accurate estimate.

          +

          What was the value of the population proportion \(p\) of the \(N\) = 2400 balls in the actual bowl? There were 900 red balls, for a proportion red of 900/2400 = 0.375 = 37.5%! How do we know this? Did the authors do an exhaustive count of all the balls? No! They were listed on the contexts of the box that the bowl came in. Hence we made the contents of the virtual bowl match the tactile bowl:

          +
          bowl %>% 
          +  summarize(sum_red = sum(color == "red"), 
          +            sum_not_red = sum(color != "red"))
          +
          # A tibble: 1 x 2
          +  sum_red sum_not_red
          +    <int>       <int>
          +1     900        1500
          +

          Let’s re-display our sampling distributions from Figures 8.11 and 8.12, but now with a vertical red line marking the true population proportion \(p\) of balls that are red = 37.5% in Figure 8.13. We see that while there is a certain amount of error in the sample proportions \(\widehat{p}\) for all three sampling distributions, on average the \(\widehat{p}\) are centered at the true population proportion red \(p\).

          +
          +Three sampling distributions with population proportion $p$ marked in red. +

          +FIGURE 8.13: Three sampling distributions with population proportion \(p\) marked in red. +

          -
          -

          8.5 Interpretation

          -

          At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a simulation representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adults in Section ??.

          +

          We also saw in this section that as your sample size \(n\) increases, your point estimates will vary less and less and be more and more concentrated around the true population parameter; this is quantified by the decreasing standard error. In other words, the typical error of your point estimates will decrease. In our simulations, as the sample size increases, the spread/variation of our sample proportions \(\widehat{p}\) around the true population proportion \(p\) decreases. You can observe this behavior as well in Figure 8.13. This is also known as having a more precise estimate.

          +

          So random sampling ensures our point estimates are accurate, while having a large sample size ensures our point estimates are precise. While accuracy and precision may sound like the same concept, they are actually not. Accuracy relates to how “on target” our estimates are whereas precision relates to how “consistent” our estimates are. Figure 8.14 illustrates the difference.

          + +
          +Comparing accuracy and precision +

          +FIGURE 8.14: Comparing accuracy and precision +

          +
          +

          As this point you might be asking yourself: “If you already knew the true proportion of the bowl’s balls that are red was 37.5%, then what did we do any of this for?” In other words, “If you already knew the value of the true unknown population parameter, then why did we do any sampling?” You might also be asking: “Why did we take 1000 repeated/replicated samples of size n = 25, 50, and 100? Shouldn’t we be taking only one sample that’s as large as possible?” Recall our definition of a simulation from Section 8.2: an approximate imitation of the operation of a process or system. We performed these simulations to study:

          +
            +
          1. The effect of sampling variation on our estimates.
          2. +
          3. The effect of sample size on sampling variation.
          4. +
          +

          In a real-life scenario, we won’t know what the true value of the population parameter is and furthermore we won’t take repeated/replicated samples but rather a single sample that’s as large as we can afford. This was also done to show the power of the technique of sampling when trying to estimate a population parameter. Since we knew the value was 37.5%, we could show just how well the different sample sizes approximated this value in their sampling distributions. We present one case study of a real-life sampling scenario in the next section: polling.


          +
          -

          8.6 Case study: Polls

          -

          In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article:

          +

          8.4 Case study: Polls

          +

          In December 4, 2013 National Public Radio in the US reported on a recent, at the time, poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article:

          After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.

          According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.

          -

          Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:

          +

          Let’s tie elements of the real-life poll in this new article with our “tactile” and “virtual” simulations from Sections 8.1 and 8.2 using the terminology, notations, and definitions we learned in Section 8.3.

            -
          1. Population: Who is the population of \(N\) observations of interest? +
          2. (Study) Population: Who is the population of \(N\) individuals or observations of interest?
              -
            • Bowl: \(N=2400\) identically-shaped balls
            • -
            • Obama poll: \(N = \text{?}\) young Americans aged 18-29
            • +
            • Simulation: \(N\) = 2400 identically-sized red and white balls
            • +
            • Obama poll: \(N\) = ? young Americans aged 18-29
          3. Population parameter: What is the population parameter?
              -
            • Bowl: The true population proportion \(p\) of the balls in the bowl that are red.
            • -
            • Obama poll: The true population proportion \(p\) of young Americans who approve of Obama’s job performance.
            • +
            • Simulation: The population proportion \(p\) of ALL the balls in the bowl that are red.
            • +
            • Obama poll: The population proportion \(p\) of ALL young Americans who approve of Obama’s job performance.
          4. -
          5. Census: What would a census be in this case? +
          6. Census: What would a census look like?
              -
            • Bowl: Manually going over all \(N=2400\) balls and exactly computing the population proportion \(p\) of the balls that are red.
            • -
            • Obama poll: Locating all \(N = \text{?}\) young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do!
            • +
            • Simulation: Manually going over all \(N\) = 2400 balls and exactly computing the population proportion \(p\) of the balls that are red, a time consuming task.
            • +
            • Obama poll: Locating all \(N\) = ? young Americans and asking them all if they approve of Obama’s job performance, an expensive task.
          7. -
          8. Sampling: How do you acquire the sample of size \(n\) observations? +
          9. Sampling: How do you collect the sample of size \(n\) individuals or observations?
              -
            • Bowl: Using the shovel to extract a sample of \(n=50\) balls.
            • -
            • Obama poll: One way would be to get phone records from a database and pick out \(n\) phone numbers. In the case of the above poll, the sample was of size \(n=2089\) young adults.
            • +
            • Simulation: Using a shovel with \(n\) slots.
            • +
            • Obama poll: One method is to get a list of phone numbers of all young Americans and pick out \(n\) phone numbers. In this poll’s case, the sample size of this poll was \(n\) = 2089 young Americans.
          10. -
          11. Point estimates/sample statistics: What is the summary statistic based on the sample of size \(n\) that estimates the unknown population parameter? +
          12. Point estimate (AKA sample statistic): What is your estimate of the unknown population parameter?
              -
            • Bowl: The sample proportion \(\widehat{p}\) red of the balls in the sample of size \(n=50\).
            • -
            • Key: The sample proportion red \(\widehat{p}\) of young Americans in the sample of size \(n=2089\) that approve of Obama’s job performance. In this study’s case, \(\widehat{p} = 0.41\) which is the quoted 41% figure in the article.
            • +
            • Simulation: The sample proportion \(\widehat{p}\) of the balls in the shovel that were red.
            • +
            • Obama poll: The sample proportion \(\widehat{p}\) of young Americans in the sample that approve of Obama’s job performance. In this poll’s case, \(\widehat{p}\) = 0.41 = 41%, the quoted percentage in the second paragraph of the article.
          13. -
          14. Representative sampling: Is the sample procedure representative? In other words, to the resulting samples “look like” the population? +
          15. Representative sampling: Is the sampling procedure representative?
              -
            • Bowl: Does our sample of \(n=50\) balls “look like” the contents of the larger set of \(N=2400\) balls in the bowl?
            • -
            • Obama poll: Does our sample of \(n=2089\) young Americans “look like” the population of all young Americans aged 18-29?
            • +
            • Simulation: Are the contents of the shovel representative of the contents of the bowl?
            • +
            • Obama poll: Is the sample of \(n\) = 2089 young Americans representative of all young Americans aged 18-29?
          16. Generalizability: Are the samples generalizable to the greater population?
              -
            • Bowl: Is \(\widehat{p}\) a “good guess” of \(p\)?
            • -
            • Obama poll: Is \(\widehat{p} = 0.41\) a “good guess” of \(p\)? In other words, can we confidently say that 41% of all young Americans approve of Obama.
            • +
            • Simulation: Is the sample proportion \(\widehat{p}\) of the shovel’s balls that are red a “good guess” of the population proportion \(p\) of the bowl’s balls that are red?
            • +
            • Obama poll: Is the sample proportion \(\widehat{p}\) = 0.41 of the sample of young Americans who support Obama a “good guess” of the population proportion \(p\) of all young Americans who support Obama? In other words, can we confidently say that 41% of all young Americans approve of Obama?
          17. Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample?
              -
            • Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the \(n=50\) shovel, and thus no particular color of ball can be favored in our samples over others.
            • -
            • Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?
            • +
            • Simulation: Since each ball was equally sized, each ball had an equal chance of being included in a shovel’s sample, and hence the sampling was unbiased.
            • +
            • Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using only mobile phone numbers, would people without mobile phones be included? What if those who disapproved of Obama were less likely to agree to take part in the poll? What about if this were an internet poll on a certain news website? Would non-readers of this website be included? We need to ask the Harvard University Institute of Politics pollsters about their sampling methodology.
          18. Random sampling: Was the sampling random?
              -
            • Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?
            • -
            • Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics.
            • +
            • Simulation: As long as you mixed the bowl sufficiently before sampling, your samples would be random.
            • +
            • Obama poll: Was the sample conducted at random? We need to ask the Harvard University Institute of Politics pollsters about their sampling methodology.
          -

          Recall the punchline of all the above:

          +

          Once again, let’s revisit the sampling paradigm:

          • If the sampling of a sample of size \(n\) is done at random, then
          • -
          • The sample is unbiased and representative of the population, thus
          • -
          • Any result based on the sample can generalize to the population, thus
          • -
          • The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest
          • +
          • the sample is unbiased and representative of the population of size \(N\), thus
          • +
          • any result based on the sample can generalize to the population, thus
          • +
          • the point estimate is a “good guess” of the unknown population parameter, thus
          • +
          • instead of performing a census, we can infer about the population using sampling.
          -

          and thus we have inferred about the population based on our sample. In the bowl example:

          +

          In our simulations using the shovel with 50 slots:

            -
          • If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \(n=50\), then
          • -
          • The contents of the shovel will “look like” the contents of the bowl, thus
          • -
          • Any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, thus
          • -
          • The sample proportion \(\widehat{p}\) of the \(n=50\) sampled balls in the shovel that are red is a “good guess” of the true population proportion \(p\) of the \(N=2400\) balls that are red.
          • +
          • If we extract a sample of \(n\) = 50 balls at random, in other words we mix the equally-sized balls before using the shovel, then
          • +
          • the contents of the shovel are an unbiased representation of the contents of the bowl’s 2400 balls, thus
          • +
          • any result based on the sample of balls can generalize to the bowl, thus
          • +
          • the sample proportion \(\widehat{p}\) of the \(n\) = 50 balls in the shovel that are red is a “good guess” of the population proportion \(p\) of the \(N\) = 2400 balls that are red, thus
          • +
          • instead of manually going over all the balls in the bowl, we can infer about the bowl using the shovel.
          -

          and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:

          +

          In the in-real life Obama poll:

            -
          • If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then
          • -
          • These 2089 young Americans would “look like” the population of all young Americans, thus
          • -
          • Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus
          • -
          • The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst all young Americans.
          • +
          • If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then
          • +
          • these 2089 young Americans would be an unbiased and representative sample of all young Americans, thus
          • +
          • any results based on this sample of 2089 young Americans can generalize to the entire population of all young Americans, thus
          • +
          • the reported sample approval rating of 41% of these 2089 young Americans is a good guess of the true approval rating among all young Americans, thus
          • +
          • instead of performing a highly costly census of all young Americans, we can infer about all young Americans using polling.
          -

          So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states:

          -
          -

          The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.

          -
          -

          Note the term margin of error, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about \(\pm 2.1\%\), in words from about 2.1% too small to about 2.1% too big. These errors are caused by sampling variation, the same sampling variation you saw studied in the histograms in Sections ?? on our tactile sampling simulations and Sections ?? on our virtual sampling simulations.

          -

          In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the infer package for statistical inference.

          +
          -

          8.7 Conclusion

          +

          8.5 Conclusion

          + +
          +

          8.5.1 Central Limit Theorem

          +

          What you did in Sections 8.1 and 8.2 (in particular in Figure 8.11 and Table 8.4) was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger sample sizes, the sampling distribution of these two point estimates become more and more normally shaped and more and more narrow. In other words, their sampling distributions become more normally distributed and the spread/variation of these sampling distributions as quantified by their standard errors gets smaller. Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following 3m38s video at https://www.youtube.com/embed/jvoxEYmQHNM explaining this crucial statistical theorem using the average weight of wild bunny rabbits and the average wing span of dragons as examples. Enjoy!

          +
          + +
          +
          -

          8.7.1 Table of inference scenarios

          +

          8.5.2 Summary table

          +

          In this chapter, we performed both tactile and virtual simulations of sampling to infer about an unknown proportion. We also presented a case study of a sampling in real life situation: polls. In both cases, we used the sample proportion \(\widehat{p}\) to estimate the population proportion \(p\). However, we are not just limited to scenarios related statistical inference for proportions. In other words, we can consider other population parameter and point estimate scenarios than just the population proportion \(p\) and sample proportion \(\widehat{p}\) scenarios we studied in this chapter. We present 5 more such scenarios in Table 8.6.

          +

          Note that the sample mean is traditionally noted as \(\overline{x}\) but can also be thought of as an estimate of the population mean \(\mu\). Thus, it can also be denoted as \(\widehat{\mu}\) as shown below in the table.

          @@ -1776,42 +1778,35 @@

          8.7.1 Table of inference scenario

          -TABLE 8.5: Scenarios of sampling for inference +TABLE 8.6: Scenarios of sampling for inference
          -

          We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:

          +

          We’ll cover all the remaining scenarios as follows, using the terminology, notation, and definitions related to sampling you saw in Section 8.3:

          +
            +
          • In Chapter 9, we’ll cover examples of statistical inference for +
              +
            • Scenario 2: The mean age \(\mu\) of all pennies in circulation in the US.
            • +
            • Scenario 3: The difference \(p_1 - p_2\) in the proportion of people who yawn when seeing someone else yawn and the proportion of people who yawn without seeing someone else yawn. This is an example of two-sample inference.
            • +
          • +
          • In Chapter 10, we’ll cover an example of statistical inference for +
              +
            • Scenario 4: The difference \(\mu_1 - \mu_2\) in average IMDB ratings for action and romance movies. This is another example of two-sample inference.
            • +
          • +
          • In Chapter 11, we’ll cover an example of statistical inference for the relationship between teaching score and various instructor demographic variables you saw in Chapter 6 on basic regression and Chapter 7 on multiple regression. Specifically
              -
            • Scenario 2 about means. Ex: the average age of pennies.
            • -
            • Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference.
            • -
            • Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This is another situation of two-sample inference.
            • +
            • Scenario 5: The intercept \(\beta_0\) of some population regression line.
            • +
            • Scenario 6: The slope \(\beta_1\) of some population regression line.
            • +
          -

          In Chapter 11 on inference for regression, we’ll cover Scenarios 5 & 6 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \(\widehat{y} = b_0 + b_1 \cdot x\), is in fact an estimate of some true population regression line \(y = \beta_0 + \beta_1 \cdot x\) based on a sample of \(n\) pairs of points \((x, y)\). Ex: Recall our sample of \(n=463\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin?

          -

          In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter.

          -
          -
          -

          8.7.2 Random sampling vs random assignment

          -
          -
          -

          8.7.3 Theory: Central Limit Theorem

          -

          What you did in Section ?? and ?? was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get

          -
            -
          1. More and more normal
          2. -
          3. More and more narrow
          4. -
          -

          Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?

          -
            -
          1. The average weight of wild bunny rabbits!
          2. -
          3. The average wing span of dragons!
          4. -
          -
          - -
          -
          -

          8.7.4 Formula: Standard error

          -
          -
          -

          8.7.5 Closing notes

          -

          This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter 9 for confidence intervals and Chapter 10 for hypothesis testing.

          +
          +

          8.5.3 Additional resources

          An R script file of all R code used in this chapter is available here.

          +
          +
          +

          8.5.4 What’s to come?

          +

          Recall in our Obama poll case study in Section 8.4 that based on this particular sample, the Harvard University Institute of Politics’ best guess of Obama’s approval rating among all young Americans was 41%. However, this isn’t the end of the story. If you read further in the article, it states:

          +
          +

          The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.

          +
          +

          Note the term margin of error, which here is plus or minus 2.1 percentage points. What this is saying is that most polls won’t get it perfectly right; there will always be a certain amount of error caused by sampling variation. The margin of error of plus or minus 2.1 percentage points is saying that a typical range of errors for polls of this type is about \(\pm\) 2.1%, in words from about 2.1% too small to about 2.1% too big for an interval of [41% - 2.1%, 41% + 2.1%] = [37.9%, 43.1%]. Remember that this notation corresponds to 37.9% and 43.1% being included as well as all numbers between the two of them. We’ll see in the next chapter that such intervals are known as confidence intervals.

          diff --git a/docs/9-confidence-intervals.html b/docs/9-confidence-intervals.html index 39a0822a0..94c42a1d9 100644 --- a/docs/9-confidence-intervals.html +++ b/docs/9-confidence-intervals.html @@ -6,20 +6,20 @@ Chapter 9 Confidence Intervals | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        18. 4.5 mutate existing variables
        19. 4.6 arrange and sort rows
        20. 4.7 join data frames
        21. 4.8 Other verbs
        22. @@ -1007,7 +1009,7 @@

          9.4 Comparing bootstrap and sampl

          To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame.

          ggplot(pennies, aes(x = age_in_2011)) +
             geom_histogram(bins = 10, color = "white")
          -

          +

          pennies %>% 
             summarize(mean_age = mean(age_in_2011),
                       median_age = median(age_in_2011))
          @@ -1018,7 +1020,7 @@

          9.4 Comparing bootstrap and sampl

          We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values.

          ggplot(pennies_sample, aes(x = age_in_2011)) +
             geom_histogram(bins = 10, color = "white")
          -

          +

          pennies_sample %>% 
             summarize(mean_age = mean(age_in_2011), median_age = median(age_in_2011))
          # A tibble: 1 x 2
          @@ -1040,8 +1042,8 @@ 

          Sampling distribution

          -->
          ggplot(sampling_distribution, aes(x = stat)) +
             geom_histogram(bins = 10, fill = "salmon", color = "white")
          -
          -Sampling distribution for n=40 samples of pennies +
          +Sampling distribution for n=40 samples of pennies

          FIGURE 9.1: Sampling distribution for n=40 samples of pennies

          @@ -1059,7 +1061,7 @@

          Bootstrap distribution

          Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.

          bootstrap_distribution %>% 
             visualize(bins = 10, fill = "blue")
          -

          +

          bootstrap_distribution %>% 
             summarize(se = sd(stat))
          # A tibble: 1 x 1
          @@ -1103,17 +1105,18 @@ 

          9.5 Interpreting the confidence i specify(formula = age_in_2011 ~ NULL) %>% generate(reps = 1000) %>% calculate(stat = "mean") %>% - get_ci() -percentile_ci2

          + get_ci()

          +
          Setting `type = "bootstrap"` in `generate()`.
          +
          percentile_ci2
          # A tibble: 1 x 2
             `2.5%` `97.5%`
              <dbl>   <dbl>
           1   18.4    25.3

          This new confidence interval also contains the value of \(\mu\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.

          -

          +

          Of the 100 confidence intervals based on samples of size \(n = 40\), 96 of them captured the population mean \(\mu = 21.152\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.

          To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.

          -

          +

          Of the 100 confidence intervals based on samples of size \(n = 40\), 87 of them captured the population mean \(\mu = 21.152\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.

          Back to our pennies example

          @@ -1160,6 +1163,7 @@

          9.6.2 Bootstrap distribution

          tactile_shovel1 %>% 
             specify(formula = color ~ NULL, success = "red") %>% 
             generate(reps = 10000)
          +
          Setting `type = "bootstrap"` in `generate()`.

          This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step.

          bootstrap_props <- tactile_shovel1 %>% 
             specify(formula = color ~ NULL, success = "red") %>% 
          @@ -1168,7 +1172,7 @@ 

          9.6.2 Bootstrap distribution

          Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.

          bootstrap_props %>% 
             visualize(bins = 25)
          -

          +

          We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.

          standard_error_ci <- bootstrap_props %>% 
             get_ci(type = "se", level = 0.95, point_estimate = p_hat)
          @@ -1179,7 +1183,7 @@ 

          9.6.2 Bootstrap distribution

          1 0.284 0.556
          bootstrap_props %>% 
             visualize(bins = 25, endpoints = standard_error_ci)
          -

          +

          We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and 0.556. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.

          @@ -1230,7 +1234,7 @@

          Confidence intervals based on 33 tactile samples

          conf_ints
          @@ -2301,9 +2305,10 @@

          9.7.2 Bootstrap distribution

          specify(formula = yawn ~ group, success ="yes") %>% generate(reps =1000) %>% calculate(stat ="diff in props", order =c("seed", "control")) +
          Setting `type = "bootstrap"` in `generate()`.
          bootstrap_distribution %>% 
             visualize(bins = 20)
          -

          +

          This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used.

          bootstrap_distribution %>% 
             get_ci(type = "percentile", level = 0.95)
          @@ -2329,11 +2334,11 @@

          9.7.2 Bootstrap distribution

          9.8 Conclusion

          -
          +

          9.8.1 What’s to come?

          This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next!

          -
          +

          9.8.2 Script of R code

          An R script file of all R code used in this chapter is available here.

          diff --git a/docs/A-appendixA.html b/docs/A-appendixA.html index 857c2c81a..3f49a0128 100644 --- a/docs/A-appendixA.html +++ b/docs/A-appendixA.html @@ -6,20 +6,20 @@ A Statistical Background | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        23. 4.5 mutate existing variables
        24. 4.6 arrange and sort rows
        25. 4.7 join data frames
        26. 4.8 Other verbs
        27. +
          +
          +

          A.2 Normal distribution discussion

          +
          diff --git a/docs/B-appendixB.html b/docs/B-appendixB.html index 9508e1c7c..94ba3e2f6 100644 --- a/docs/B-appendixB.html +++ b/docs/B-appendixB.html @@ -6,20 +6,20 @@ B Inference Examples | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        28. 4.5 mutate existing variables
        29. 4.6 arrange and sort rows
        30. 4.7 join data frames
        31. 4.8 Other verbs
        32. null_distn_one_mean %>% visualize()
          -

          +

          We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \(p\)-value.

          null_distn_one_mean %>%
             visualize(obs_stat = x_bar, direction = "greater")
          -

          +

          Calculate \(p\)-value
          pvalue <- null_distn_one_mean %>%
          @@ -746,7 +739,7 @@ 

          Bootstrapping for confidence interval

          1 23.3 23.6
          boot_distn_one_mean %>% 
             visualize(endpoints = ci, direction = "between")
          -

          +

          We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).

          Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.


          @@ -878,11 +871,11 @@

          Simulation for hypothesis test

          generate(reps = 10000) %>% calculate(stat = "prop")
          null_distn_one_prop %>% visualize()
          -

          +

          We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \(p\)-value:

          null_distn_one_prop %>% 
             visualize(obs_stat = p_hat, direction = "both")
          -

          +

          Calculate \(p\)-value
          pvalue <- null_distn_one_prop %>% 
          @@ -919,7 +912,7 @@ 

          Bootstrapping for confidence interval

          1 0.64 0.81
          boot_distn_one_prop %>% 
             visualize(endpoints = ci, direction = "between")
          -

          +

          We see that 0.80 is contained in this confidence interval as a plausible value of \(\pi\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.

          Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.


          @@ -1076,11 +1069,11 @@

          Randomization for hypothesis test

          generate(reps = 10000) %>% calculate(stat = "diff in props", order = c("yes", "no"))
          null_distn_two_props %>% visualize()
          -

          +

          We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \(p\)-value.

          null_distn_two_props %>% 
             visualize(obs_stat = d_hat, direction = "two_sided")
          -

          +

          Calculate \(p\)-value
          pvalue <- null_distn_two_props %>% 
          @@ -1109,7 +1102,7 @@ 

          Bootstrapping for confidence interval

          1 -0.161 -0.0378
          boot_distn_two_props %>% 
             visualize(endpoints = ci, direction = "between")
          -

          +

          We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.

          Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.


          @@ -1348,11 +1341,11 @@

          Randomization for hypothesis test

          calculate(stat = "diff in means", order = c("Sacramento_ CA", "Cleveland_ OH"))
          null_distn_two_means %>% visualize()
          -

          +

          We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \(p\)-value.

          null_distn_two_means %>% 
             visualize(obs_stat = d_hat, direction = "both")
          -

          +

          Calculate \(p\)-value
          pvalue <- null_distn_two_means %>% 
          @@ -1382,7 +1375,7 @@ 

          Bootstrapping for confidence interval

          1 -1446. 11308.
          boot_distn_two_means %>% 
             visualize(endpoints = ci, direction = "between")
          -

          +

          We see that 0 is contained in this confidence interval as a plausible value of \(\mu_{sac} - \mu_{cle}\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.

          Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.

          Note: You could also use the null distribution based on randomization with a shift to have its center at \(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.

          @@ -1528,11 +1521,11 @@

          Bootstrapping for hypothesis test

          generate(reps = 10000) %>% calculate(stat = "mean")
          null_distn_paired_means %>% visualize()
          -

          +

          We can next use this distribution to observe our \(p\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \(p\)-value.

          null_distn_paired_means %>% 
             visualize(obs_stat = d_hat, direction = "less")
          -

          +

          Calculate \(p\)-value
          pvalue <- null_distn_paired_means %>% 
          @@ -1561,7 +1554,7 @@ 

          Bootstrapping for confidence interval

          1 -0.112 -0.0503
          boot_distn_paired_means %>% 
             visualize(endpoints = ci, direction = "between")
          -

          +

          We see that 0 is not contained in this confidence interval as a plausible value of \(\mu_{diff}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.

          Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.


          diff --git a/docs/C-appendixC.html b/docs/C-appendixC.html index f7f5eb315..7169fbe9d 100644 --- a/docs/C-appendixC.html +++ b/docs/C-appendixC.html @@ -6,20 +6,20 @@ C Reach for the Stars | Statistical Inference via Data Science - + - + - + @@ -214,9 +214,10 @@
        33. 4.5 mutate existing variables
        34. 4.6 arrange and sort rows
        35. 4.7 join data frames
        36. 4.8 Other verbs
          • 4.8.1 select variables
          • @@ -232,26 +233,24 @@
          • 5 Data Importing & “Tidy” Data
          • B Inference Examples
            • Needed packages
            • @@ -566,8 +559,8 @@

              C.1 Sorted barplots

              ggplot(data = flights, mapping = aes(x = carrier)) +
                 geom_bar() +
                 scale_x_discrete(limits = names(sorted_flights))
              -
              -Number of flights departing NYC in 2013 by airline - Descending numbers +
              +Number of flights departing NYC in 2013 by airline - Descending numbers

              FIGURE C.1: Number of flights departing NYC in 2013 by airline - Descending numbers

              @@ -587,8 +580,8 @@

              C.2.1 Interactive linegraphs

              rownames(flights_summarized) <- flights_summarized$date flights_summarized <- select(flights_summarized, -date) dyRangeSelector(dygraph(flights_summarized))
              -
              - +
              +


              The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.)

              D.1 Chapter 2 Solutions

              library(dplyr)
               library(ggplot2)
               library(nycflights13)
              -

              (LC2.1) What does any ONE row in this flights dataset refer to?

              +

              (LC2.1) Repeat the above installing steps, but for the dplyr, nycflights13, and knitr packages. This will install the earlier mentioned dplyr package, the nycflights13 package containing data on all domestic flights leaving a NYC airport in 2013, and the knitr package for writing reports in R.

              +

              (LC2.2) “Load” the dplyr, nycflights13, and knitr packages as well by repeating the above steps.

              +

              Solution: If the following code runs with no errors, you’ve succeeded!

              +
              library(dplyr)
              +library(nycflights13)
              +library(knitr)
              +

              (LC2.3) What does any ONE row in this flights dataset refer to?

              • A. Data on an airline
              • B. Data on a flight
              • @@ -555,7 +556,7 @@

                D.1 Chapter 2 Solutions

              • a flight path would be United 1545 to Houston
              • a flight would be United 1545 to Houston at a specific date/time. For example: 2013/1/1 at 5:15am.
              -

              (LC2.2) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables?

              +

              (LC2.4) What are some examples in this dataset of categorical variables? What makes them different than quantitative variables?

              Solution: Hint: Type ?flights in the console to see what all the variables mean!

              • Categorical: @@ -570,13 +571,21 @@

                D.1 Chapter 2 Solutions

              • time_hour time
            -

            (LC2.3) What does int, dbl, and chr mean in the output above?

            +

            (LC2.5) What does int, dbl, and chr mean in the output above?

            Solution:

            • int: integer. Used to count things i.e. a discrete value. Ex: the # of cars parked in a lot
            • dbl: double. Used to measure things. i.e. a continuous value. Ex: your height in inches
            • chr: character. i.e. text
            +

            (LC2.6) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information.

            +

            Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %>% filter(faa == "DEN") to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label.

            +

            (LC2.7) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.

            +

            Solution:

            +
              +
            • In the weather example in LC3.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question.
            • +
            • Anything else pertains to observations: temp, humid, wind_speed, etc.
            • +

        37. @@ -598,7 +607,7 @@

          D.2 Chapter 3 Solutions

          Solution: Many possibilities for this one, see the plot below. Is there a pattern in departure delay depending on when the flight is scheduled to depart? Interestingly, there seems to be only two blocks of time where flights depart.

          ggplot(data = alaska_flights, mapping = aes(x = dep_time, y = dep_delay)) +
             geom_point()
          -

          +

          (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?

          Solution: Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot? It thins out the points so we address overplotting. But more importantly it hints at the (statistical) density and distribution of the points: where are the points concentrated, where do they occur. We will see more about densities and distributions in Chapter 6 when we switch gears to statistical topics.

          (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2?

          @@ -615,8 +624,8 @@

          D.2 Chapter 3 Solutions

          Solution: Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013. Humidity is a good one to look at, since this very closely related to the cycles of a day.

          ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = humid)) +
             geom_line()
          -

          -

          (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?

          +

          +

          (LC3.14) What does changing the number of bins from 30 to 40 tell us about the distribution of temperatures?

          Solution: The distribution doesn’t change much. But by refining the bin width, we see that the temperature data has a high degree of accuracy. What do I mean by accuracy? Looking at the temp variabile by View(weather), we see that the precision of each temperature recording is 2 decimal places.

          (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed?

          Solution: It is rather symmetric, i.e. there are no long tails on only one side of the distribution

          @@ -644,7 +653,7 @@

          D.2 Chapter 3 Solutions

          (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.

          Solution:

            -
          • We’d have 365 facets to look at. Way to many.
          • +
          • We’d have 365 facets to look at. Way too many.
          • We don’t really care about day-to-day fluctuation in weather so much, but maybe more week-to-week variation. We’d like to focus on seasonal trends.

          (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that?

          @@ -653,12 +662,12 @@

          D.2 Chapter 3 Solutions

          Solution: It appears to be an outlier. Let’s revisit the use of the filter command to hone in on it. We want all data points where the month is 5 and temp<25

          weather %>% 
             filter(month==5 & temp < 25)
          -
          # A tibble: 1 x 15
          +
          # A tibble: 1 x 16
             origin  year month   day  hour  temp  dewp humid wind_dir wind_speed wind_gust
             <chr>  <dbl> <dbl> <int> <int> <dbl> <dbl> <dbl>    <dbl>      <dbl>     <dbl>
           1 JFK     2013     5     8    22  13.1  12.0  95.3       80       8.06        NA
          -# … with 4 more variables: precip <dbl>, pressure <dbl>, visib <dbl>,
          -#   time_hour <dttm>
          +# … with 5 more variables: precip <dbl>, pressure <dbl>, visib <dbl>, +# time_hour <dttm>, temp_in_C <dbl>

          There appears to be only one hour and only at JFK that recorded 13.1 F (-10.5 C) in the month of May. This is probably a data entry mistake! Why wasn’t the weather at least similar at EWR (Newark) and LGA (La Guardia)?

          (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is?

          Solution: We are now interested in the spread of the data. One measure some of you may have seen previously is the standard deviation. But in this plot we can read off the Interquartile Range (IQR):

          @@ -791,8 +800,8 @@

          D.2 Chapter 3 Solutions

          -TABLE 9.2: 33 confidence intervals from 33 tactile samples of size n=50 +TABLE 9.2: 33 confidence intervals from 33 tactile samples of size n=50
          -

          (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?

          -

          Solution: Because we need a way to group many numerical observations together, say by grouping by month. For pressure, we have near unique values for pressure, i.e. no groups, so we can’t make boxplots.

          +

          (LC3.24) We looked at the distribution of the numerical variable temp split by the numerical variable month that we converted to a categorical variable using the factor() function. Why would a boxplot of temp split by the numerical variable pressure similarly converted to a categorical variable using the factor() not be informative?

          +

          Solution: Because there are 12 unique values of month yielding only 12 boxes in our boxplot. There are many more unique values of pressure (469 unique values in fact), because values are to the first decimal place. This would lead to 469 boxes, which is too many for people to digest.

          (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?

          Solution: In a histogram, the bin corresponding to where an outlier lies may not by high enough for us to see. In a boxplot, they are explicitly labelled separately.

          (LC3.26) Why are histograms inappropriate for visualizing categorical variables?

          @@ -825,145 +834,8 @@

          D.2 Chapter 3 Solutions

          D.3 Chapter 4 Solutions

          library(dplyr)
           library(ggplot2)
          -library(nycflights13)
          -library(tidyr)
          -library(readr)
          -

          (LC4.1) Consider the following data frame of average number of servings of beer, spirits, and wine consumption in three countries as reported in the FiveThirtyEight article Dear Mona Followup: Where Do People Drink The Most Beer, Wine And Spirits?

          -
          # A tibble: 3 x 4
          -  country     beer_servings spirit_servings wine_servings
          -  <chr>               <int>           <int>         <int>
          -1 Canada                240             122           100
          -2 South Korea           140              16             9
          -3 USA                   249             158            84
          -

          This data frame is not in tidy format. What would it look like if it were?

          -

          Solution: There are three variables of information included: country, alcohol type, and number of servings. In tidy format, each of these variables of information are included in their own column.

          -
          # A tibble: 9 x 3
          -  country     `alcohol type` servings
          -  <chr>       <chr>             <int>
          -1 Canada      beer                240
          -2 Canada      spirit              122
          -3 Canada      wine                100
          -4 South Korea beer                140
          -5 South Korea spirit               16
          -6 South Korea wine                  9
          -7 USA         beer                249
          -8 USA         spirit              158
          -9 USA         wine                 84
          -

          Note that how the rows are sorted is inconsequential in whether or not the data frame is in tidy format. In other words, the following data frame sorted by alcohol type instead of country is equally in tidy format.

          -
          # A tibble: 9 x 3
          -  country     `alcohol type` servings
          -  <chr>       <chr>             <int>
          -1 Canada      beer                240
          -2 South Korea beer                140
          -3 USA         beer                249
          -4 Canada      spirit              122
          -5 South Korea spirit               16
          -6 USA         spirit              158
          -7 Canada      wine                100
          -8 South Korea wine                  9
          -9 USA         wine                 84
          -

          (LC4.2) What properties of the observational unit do each of lat, lon, alt, tz, dst, and tzone describe for the airports data frame? Note that you may want to use ?airports to get more information.

          -

          Solution: lat long represent the airport geographic coordinates, alt is the altitude above sea level of the airport (Run airports %>% filter(faa == "DEN") to see the altitude of Denver International Airport), tz is the time zone difference with respect to GMT in London UK, dst is the daylight savings time zone, and tzone is the time zone label.

          -

          (LC4.3) Provide the names of variables in a data frame with at least three variables in which one of them is an identification variable and the other two are not. In other words, create your own tidy dataset that matches these conditions.

          -

          Solution:

          -
            -
          • In the weather example in LC3.8, the combination of origin, year, month, day, hour are identification variables as they identify the observation in question.
          • -
          • Anything else pertains to observations: temp, humid, wind_speed, etc.
          • -
          -

          (LC4.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame.

          -

          Solution: Running the following in the console:

          -
          dem_score_tidy <- gather(data = dem_score, key = year, value = democracy_score, - country)
          -

          Let’s now compare the dem_score and dem_score_tidy. dem_score has democracy score information for each year in columns, whereas in dem_score_tidy there are explicit variables year and democracy_score. While both representations of the data contain the same information, we can only use ggplot() to create plots using the dem_score_tidy data frame.

          -
          dem_score
          -
          # A tibble: 96 x 10
          -   country    `1952` `1957` `1962` `1967` `1972` `1977` `1982` `1987` `1992`
          -   <chr>       <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
          - 1 Albania        -9     -9     -9     -9     -9     -9     -9     -9      5
          - 2 Argentina      -9     -1     -1     -9     -9     -9     -8      8      7
          - 3 Armenia        -9     -7     -7     -7     -7     -7     -7     -7      7
          - 4 Australia      10     10     10     10     10     10     10     10     10
          - 5 Austria        10     10     10     10     10     10     10     10     10
          - 6 Azerbaijan     -9     -7     -7     -7     -7     -7     -7     -7      1
          - 7 Belarus        -9     -7     -7     -7     -7     -7     -7     -7      7
          - 8 Belgium        10     10     10     10     10     10     10     10     10
          - 9 Bhutan        -10    -10    -10    -10    -10    -10    -10    -10    -10
          -10 Bolivia        -4     -3     -3     -4     -7     -7      8      9      9
          -# … with 86 more rows
          -
          dem_score_tidy
          -
          # A tibble: 864 x 3
          -   country    year  democracy_score
          -   <chr>      <chr>           <dbl>
          - 1 Albania    1952               -9
          - 2 Argentina  1952               -9
          - 3 Armenia    1952               -9
          - 4 Australia  1952               10
          - 5 Austria    1952               10
          - 6 Azerbaijan 1952               -9
          - 7 Belarus    1952               -9
          - 8 Belgium    1952               10
          - 9 Bhutan     1952              -10
          -10 Bolivia    1952               -4
          -# … with 854 more rows
          -

          (LC4.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame.

          -

          Solution: The code is similar

          -
          life_expectancy <- read_csv('https://moderndive.com/data/le_mess.csv')
          -life_expectancy_tidy <- gather(data = life_expectancy, key = year, value = life_expectancy, -country)
          -

          We observe the same construct structure with respect to year in life_expectancy vs life_expectancy_tidy as we did in dem_score vs dem_score_tidy:

          -
          life_expectancy
          -
          # A tibble: 202 x 67
          -   country `1951` `1952` `1953` `1954` `1955` `1956` `1957` `1958` `1959` `1960`
          -   <chr>    <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
          - 1 Afghan…   27.1   27.7   28.2   28.7   29.3   29.8   30.3   30.9   31.4   31.9
          - 2 Albania   54.7   55.2   55.8   56.6   57.4   58.4   59.5   60.6   61.8   62.9
          - 3 Algeria   43.0   43.5   44.0   44.4   44.9   45.4   45.9   46.4   47.0   47.5
          - 4 Angola    31.0   31.6   32.1   32.7   33.2   33.8   34.3   34.9   35.4   36.0
          - 5 Antigu…   58.3   58.8   59.3   59.9   60.4   60.9   61.4   62.0   62.5   63.0
          - 6 Argent…   61.9   62.5   63.1   63.6   64.0   64.4   64.7   65     65.2   65.4
          - 7 Armenia   62.7   63.1   63.6   64.1   64.5   65     65.4   65.9   66.4   66.9
          - 8 Aruba     59.0   60.0   61.0   61.9   62.7   63.4   64.1   64.7   65.2   65.7
          - 9 Austra…   68.7   69.1   69.7   69.8   70.2   70.0   70.3   70.9   70.4   70.9
          -10 Austria   65.2   66.8   67.3   67.3   67.6   67.7   67.5   68.5   68.4   68.8
          -# … with 192 more rows, and 56 more variables: `1961` <dbl>, `1962` <dbl>,
          -#   `1963` <dbl>, `1964` <dbl>, `1965` <dbl>, `1966` <dbl>, `1967` <dbl>,
          -#   `1968` <dbl>, `1969` <dbl>, `1970` <dbl>, `1971` <dbl>, `1972` <dbl>,
          -#   `1973` <dbl>, `1974` <dbl>, `1975` <dbl>, `1976` <dbl>, `1977` <dbl>,
          -#   `1978` <dbl>, `1979` <dbl>, `1980` <dbl>, `1981` <dbl>, `1982` <dbl>,
          -#   `1983` <dbl>, `1984` <dbl>, `1985` <dbl>, `1986` <dbl>, `1987` <dbl>,
          -#   `1988` <dbl>, `1989` <dbl>, `1990` <dbl>, `1991` <dbl>, `1992` <dbl>,
          -#   `1993` <dbl>, `1994` <dbl>, `1995` <dbl>, `1996` <dbl>, `1997` <dbl>,
          -#   `1998` <dbl>, `1999` <dbl>, `2000` <dbl>, `2001` <dbl>, `2002` <dbl>,
          -#   `2003` <dbl>, `2004` <dbl>, `2005` <dbl>, `2006` <dbl>, `2007` <dbl>,
          -#   `2008` <dbl>, `2009` <dbl>, `2010` <dbl>, `2011` <dbl>, `2012` <dbl>,
          -#   `2013` <dbl>, `2014` <dbl>, `2015` <dbl>, `2016` <dbl>
          -
          life_expectancy_tidy
          -
          # A tibble: 13,332 x 3
          -   country             year  life_expectancy
          -   <chr>               <chr>           <dbl>
          - 1 Afghanistan         1951             27.1
          - 2 Albania             1951             54.7
          - 3 Algeria             1951             43.0
          - 4 Angola              1951             31.0
          - 5 Antigua and Barbuda 1951             58.3
          - 6 Argentina           1951             61.9
          - 7 Armenia             1951             62.7
          - 8 Aruba               1951             59.0
          - 9 Australia           1951             68.7
          -10 Austria             1951             65.2
          -# … with 13,322 more rows
          -

          (LC4.6) What are common characteristics of “tidy” datasets?

          -

          Solution: Rows correspond to observations, while columns correspond to variables.

          -

          (LC4.7) What makes “tidy” datasets useful for organizing data?

          -

          Solution: Tidy datasets are an organized way of viewing data. We’ll see later that this format is required for the ggplot2 and dplyr packages for data visualization and wrangling.

          -

          (LC4.8) What are some advantages of data in normal forms? What are some disadvantages?

          -

          Solution: When datasets are in normal form, we can easily _join them with other datasets! For example, can we join the flights data with the planes data? We’ll see this more in Chapter 5!

          -
          -
          -
          -

          D.4 Chapter 5 Solutions

          -
          library(dplyr)
          -library(ggplot2)
           library(nycflights13)
          -

          (LC5.1) What’s another way using the “not” operator ! we could filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above.

          +

          (LC4.1) What’s another way using the “not” operator ! to filter only the rows that are not going to Burlington, VT nor Seattle, WA in the flights data frame? Test this out using the code above.

          Solution:

          # Original in book
           not_BTV_SEA <- flights %>% 
          @@ -976,13 +848,13 @@ 

          D.4 Chapter 5 Solutions

          # Yet another way not_BTV_SEA <- flights %>% filter(dest != "BTV" & dest != "SEA")
          -

          (LC5.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?

          +

          (LC4.2) Say a doctor is studying the effect of smoking on lung cancer for a large number of patients who have records measured at five year intervals. She notices that a large number of patients have missing data points because the patient has died, so she chooses to ignore these patients in her analysis. What is wrong with this doctor’s approach?

          Solution: The missing patients may have died of lung cancer! So to ignore them might seriously bias your results! It is very important to think of what the consequences on your analysis are of ignoring missing data! Ask yourself:

          • There is a systematic reasons why certain values are missing? If so, you might be biasing your results!
          • If there isn’t, then it might be ok to “sweep missing values under the rug.”
          -

          (LC5.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to?

          +

          (LC4.3) Modify the above summarize function to create summary_temp to also use the n() summary function: summarize(count = n()). What does the returned value correspond to?

          Solution: It corresponds to a count of the number of observations/rows:

          weather %>% 
             summarize(count = n())
          @@ -990,7 +862,7 @@

          D.4 Chapter 5 Solutions

          count <int> 1 26115
          -

          (LC5.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp <- weather %>% summarize(mean = mean(temp, na.rm = TRUE)) first.

          +

          (LC4.4) Why doesn’t the following code work? Run the code line by line instead of all at once, and then look at the data. In other words, run summary_temp <- weather %>% summarize(mean = mean(temp, na.rm = TRUE)) first.

          summary_temp <- weather %>%   
             summarize(mean = mean(temp, na.rm = TRUE)) %>% 
             summarize(std_dev = sd(temp, na.rm = TRUE))
          @@ -1002,182 +874,11 @@

          D.4 Chapter 5 Solutions

          <dbl> 1 55.3

          Because after the first summarize(), the variable temp disappears as it has been collapsed to the value mean. So when we try to run the second summarize(), it can’t find the variable temp to compute the standard deviation of.

          -

          (LC5.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year?

          +

          (LC4.5) Recall from Chapter 3 when we looked at plots of temperatures by months in NYC. What does the standard deviation column in the summary_monthly_temp data frame tell us about temperatures in New York City throughout the year?

          Solution:

          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -month - -mean - -std_dev -
          -1 - -35.6 - -10.22 -
          -2 - -34.3 - -6.98 -
          -3 - -39.9 - -6.25 -
          -4 - -51.7 - -8.79 -
          -5 - -61.8 - -9.68 -
          -6 - -72.2 - -7.55 -
          -7 - -80.1 - -7.12 -
          -8 - -74.5 - -5.19 -
          -9 - -67.4 - -8.47 -
          -10 - -60.1 - -8.85 -
          -11 - -45.0 - -10.44 -
          -12 - -38.4 - -9.98 -

          The standard deviation is a quantification of spread and variability. We see that the period in November, December, and January has the most variation in weather, so you can expect very different temperatures on different days.

          -

          (LC5.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?

          +

          (LC4.6) What code would be required to get the mean and standard deviation temperature for each day in 2013 for NYC?

          Solution:

          -
          summary_temp_by_day <- weather %>% 
          -  group_by(year, month, day) %>% 
          -  summarize(
          -          mean = mean(temp, na.rm = TRUE),
          -          std_dev = sd(temp, na.rm = TRUE)
          -          )
          -summary_temp_by_day
          -
          # A tibble: 364 x 5
          -# Groups:   year, month [?]
          -    year month   day  mean std_dev
          -   <dbl> <dbl> <int> <dbl>   <dbl>
          - 1  2013     1     1  37.0    4.00
          - 2  2013     1     2  28.7    3.45
          - 3  2013     1     3  30.0    2.58
          - 4  2013     1     4  34.9    2.45
          - 5  2013     1     5  37.2    4.01
          - 6  2013     1     6  40.1    4.40
          - 7  2013     1     7  40.6    3.68
          - 8  2013     1     8  40.1    5.77
          - 9  2013     1     9  43.2    5.40
          -10  2013     1    10  43.8    2.95
          -# … with 354 more rows

          Note: group_by(day) is not enough, because day is a value between 1-31. We need to group_by(year, month, day)

          library(dplyr)
           library(nycflights13)
          @@ -1188,842 +889,20 @@ 

          D.4 Chapter 5 Solutions

          mean = mean(temp, na.rm = TRUE), std_dev = sd(temp, na.rm = TRUE) )
          -

          (LC5.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset?

          +

          (LC4.7) Recreate by_monthly_origin, but instead of grouping via group_by(origin, month), group variables in a different order group_by(month, origin). What differs in the resulting dataset?

          Solution:

          -
          by_monthly_origin <- flights %>% 
          -  group_by(month, origin) %>% 
          -  summarize(count = n())
          by_monthly_origin
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -month - -origin - -count -
          -1 - -EWR - -9893 -
          -1 - -JFK - -9161 -
          -1 - -LGA - -7950 -
          -2 - -EWR - -9107 -
          -2 - -JFK - -8421 -
          -2 - -LGA - -7423 -
          -3 - -EWR - -10420 -
          -3 - -JFK - -9697 -
          -3 - -LGA - -8717 -
          -4 - -EWR - -10531 -
          -4 - -JFK - -9218 -
          -4 - -LGA - -8581 -
          -5 - -EWR - -10592 -
          -5 - -JFK - -9397 -
          -5 - -LGA - -8807 -
          -6 - -EWR - -10175 -
          -6 - -JFK - -9472 -
          -6 - -LGA - -8596 -
          -7 - -EWR - -10475 -
          -7 - -JFK - -10023 -
          -7 - -LGA - -8927 -
          -8 - -EWR - -10359 -
          -8 - -JFK - -9983 -
          -8 - -LGA - -8985 -
          -9 - -EWR - -9550 -
          -9 - -JFK - -8908 -
          -9 - -LGA - -9116 -
          -10 - -EWR - -10104 -
          -10 - -JFK - -9143 -
          -10 - -LGA - -9642 -
          -11 - -EWR - -9707 -
          -11 - -JFK - -8710 -
          -11 - -LGA - -8851 -
          -12 - -EWR - -9922 -
          -12 - -JFK - -9146 -
          -12 - -LGA - -9067 -

          In by_monthly_origin the month column is now first and the rows are sorted by month instead of origin. If you compare the values of count in by_origin_monthly and by_monthly_origin using the View() function, you’ll see that the values are actually the same, just presented in a different order.

          -

          (LC5.8) How could we identify how many flights left each of the three airports for each carrier?

          +

          (LC4.8) How could we identify how many flights left each of the three airports for each carrier?

          Solution: We could summarize the count from each airport using the n() function, which counts rows.

          -
          count_flights_by_airport <- flights %>% 
          -  group_by(origin, carrier) %>% 
          -  summarize(count=n())
          -
          count_flights_by_airport
          - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
          -origin - -carrier - -count -
          -EWR - -9E - -1268 -
          -EWR - -AA - -3487 -
          -EWR - -AS - -714 -
          -EWR - -B6 - -6557 -
          -EWR - -DL - -4342 -
          -EWR - -EV - -43939 -
          -EWR - -MQ - -2276 -
          -EWR - -OO - -6 -
          -EWR - -UA - -46087 -
          -EWR - -US - -4405 -
          -EWR - -VX - -1566 -
          -EWR - -WN - -6188 -
          -JFK - -9E - -14651 -
          -JFK - -AA - -13783 -
          -JFK - -B6 - -42076 -
          -JFK - -DL - -20701 -
          -JFK - -EV - -1408 -
          -JFK - -HA - -342 -
          -JFK - -MQ - -7193 -
          -JFK - -UA - -4534 -
          -JFK - -US - -2995 -
          -JFK - -VX - -3596 -
          -LGA - -9E - -2541 -
          -LGA - -AA - -15459 -
          -LGA - -B6 - -6002 -
          -LGA - -DL - -23067 -
          -LGA - -EV - -8826 -
          -LGA - -F9 - -685 -
          -LGA - -FL - -3260 -
          -LGA - -MQ - -16928 -
          -LGA - -OO - -26 -
          -LGA - -UA - -8044 -
          -LGA - -US - -13136 -
          -LGA - -WN - -6087 -
          -LGA - -YV - -601 -

          All remarkably similar! Note: the n() function counts rows, whereas the sum(VARIABLE_NAME) funciton sums all values of a certain numerical variable VARIABLE_NAME.

          -

          (LC5.9) How does the filter operation differ from a group_by followed by a summarize?

          +

          (LC4.9) How does the filter operation differ from a group_by followed by a summarize?

          Solution:

          • filter picks out rows from the original dataset without modifying them, whereas
          • group_by %>% summarize computes summaries of numerical variables, and hence reports new values.
          -

          (LC5.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value?

          +

          (LC4.10) What do positive values of the gain variable in flights correspond to? What about negative values? And what about a zero value?

          Solution:

          • Say a flight departed 20 minutes late, i.e. dep_delay = 20
          • @@ -2032,141 +911,25 @@

            D.4 Chapter 5 Solutions

          • 0 means the departure and arrival time were the same, so no time was made up in the air. We see in most cases that the gain is near 0 minutes.
          • I never understood this. If the pilot says “we’re going make up time in the air” because of delay by flying faster, why don’t you always just fly faster to begin with?
          -

          (LC5.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights.

          +

          (LC4.11) Could we create the dep_delay and arr_delay columns by simply subtracting dep_time from sched_dep_time and similarly for arrivals? Try the code out and explain any differences between the result and what actually appears in flights.

          Solution: No because you can’t do direct arithmetic on times. The difference in time between 12:03 and 11:59 is 4 minutes, but 1203-1159 = 44

          -

          (LC5.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values.

          +

          (LC4.12) What can we say about the distribution of gain? Describe it in a few sentences using the plot and the gain_summary data frame values.

          Solution: Most of the time the gain is a little under zero, most of the time the gain is between -50 and 50 minutes. There are some extreme cases however!

          -

          (LC5.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour?

          +

          (LC4.13) Looking at Figure 4.7, when joining flights and weather (or, in other words, matching the hourly weather values with each flight), why do we need to join by all of year, month, day, hour, and origin, and not just hour?

          Solution: Because hour is simply a value between 0 and 23; to identify a specific hour, we need to know which year, month, day and at which airport.

          -

          (LC5.14) What surprises you about the top 10 destinations from NYC in 2013?

          +

          (LC4.14) What surprises you about the top 10 destinations from NYC in 2013?

          Solution: This question is subjective! What surprises me is the high number of flights to Boston. Wouldn’t it be easier and quicker to take the train?

          -

          (LC5.15) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways.

          +

          (LC4.15) What are some advantages of data in normal forms? What are some disadvantages?

          +

          Solution: When datasets are in normal form, we can easily _join them with other datasets! For example, we can join the flights data with the planes data.

          +

          (LC4.16) What are some ways to select all three of the dest, air_time, and distance variables from flights? Give the code showing how to do this in at least three different ways.

          Solution:

          -
          # The regular way:
          -flights %>% 
          -  select(dest, air_time, distance)
          -
          # A tibble: 336,776 x 3
          -   dest  air_time distance
          -   <chr>    <dbl>    <dbl>
          - 1 IAH        227     1400
          - 2 IAH        227     1416
          - 3 MIA        160     1089
          - 4 BQN        183     1576
          - 5 ATL        116      762
          - 6 ORD        150      719
          - 7 FLL        158     1065
          - 8 IAD         53      229
          - 9 MCO        140      944
          -10 ORD        138      733
          -# … with 336,766 more rows
          -
          # Since they are sequential columns in the dataset
          -flights %>% 
          -  select(dest:distance)
          -
          # A tibble: 336,776 x 3
          -   dest  air_time distance
          -   <chr>    <dbl>    <dbl>
          - 1 IAH        227     1400
          - 2 IAH        227     1416
          - 3 MIA        160     1089
          - 4 BQN        183     1576
          - 5 ATL        116      762
          - 6 ORD        150      719
          - 7 FLL        158     1065
          - 8 IAD         53      229
          - 9 MCO        140      944
          -10 ORD        138      733
          -# … with 336,766 more rows
          -
          # Not as effective, by removing everything else
          -flights %>% 
          -  select(-year, -month, -day, -dep_time, -sched_dep_time, -dep_delay, -arr_time,
          -         -sched_arr_time, -arr_delay, -carrier, -flight, -tailnum, -origin, 
          -         -hour, -minute, -time_hour)
          -
          # A tibble: 336,776 x 6
          -   dest  air_time distance  gain hours gain_per_hour
          -   <chr>    <dbl>    <dbl> <dbl> <dbl>         <dbl>
          - 1 IAH        227     1400    -9 3.78          -2.38
          - 2 IAH        227     1416   -16 3.78          -4.23
          - 3 MIA        160     1089   -31 2.67         -11.6 
          - 4 BQN        183     1576    17 3.05           5.57
          - 5 ATL        116      762    19 1.93           9.83
          - 6 ORD        150      719   -16 2.5           -6.4 
          - 7 FLL        158     1065   -24 2.63          -9.11
          - 8 IAD         53      229    11 0.883         12.5 
          - 9 MCO        140      944     5 2.33           2.14
          -10 ORD        138      733   -10 2.3           -4.35
          -# … with 336,766 more rows
          -

          (LC5.16) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains.

          +

          (LC4.17) How could one use starts_with, ends_with, and contains to select columns from the flights data frame? Provide three different examples in total: one for starts_with, one for ends_with, and one for contains.

          Solution:

          -
          # Anything that starts with "d"
          -flights %>% 
          -  select(starts_with("d"))
          -
          # A tibble: 336,776 x 5
          -     day dep_time dep_delay dest  distance
          -   <int>    <int>     <dbl> <chr>    <dbl>
          - 1     1      517         2 IAH       1400
          - 2     1      533         4 IAH       1416
          - 3     1      542         2 MIA       1089
          - 4     1      544        -1 BQN       1576
          - 5     1      554        -6 ATL        762
          - 6     1      554        -4 ORD        719
          - 7     1      555        -5 FLL       1065
          - 8     1      557        -3 IAD        229
          - 9     1      557        -3 MCO        944
          -10     1      558        -2 ORD        733
          -# … with 336,766 more rows
          -
          # Anything related to delays:
          -flights %>% 
          -  select(ends_with("delay"))
          -
          # A tibble: 336,776 x 2
          -   dep_delay arr_delay
          -       <dbl>     <dbl>
          - 1         2        11
          - 2         4        20
          - 3         2        33
          - 4        -1       -18
          - 5        -6       -25
          - 6        -4        12
          - 7        -5        19
          - 8        -3       -14
          - 9        -3        -8
          -10        -2         8
          -# … with 336,766 more rows
          -
          # Anything related to departures:
          -flights %>% 
          -  select(contains("dep"))
          -
          # A tibble: 336,776 x 3
          -   dep_time sched_dep_time dep_delay
          -      <int>          <int>     <dbl>
          - 1      517            515         2
          - 2      533            529         4
          - 3      542            540         2
          - 4      544            545        -1
          - 5      554            600        -6
          - 6      554            558        -4
          - 7      555            600        -5
          - 8      557            600        -3
          - 9      557            600        -3
          -10      558            600        -2
          -# … with 336,766 more rows
          -

          (LC5.17) Why might we want to use the select() function on a data frame?

          +

          (LC4.18) Why might we want to use the select() function on a data frame?

          Solution: To narrow down the data frame, to make it easier to look at. Using View() for example.

          -

          (LC5.18) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

          +

          (LC4.19) Create a new data frame that shows the top 5 airports with the largest arrival delays from NYC in 2013.

          Solution:

          -
          top_five <- flights %>% 
          -  group_by(dest) %>% 
          -  summarize(avg_delay = mean(arr_delay, na.rm = TRUE)) %>% 
          -  arrange(desc(avg_delay)) %>% 
          -  top_n(n = 5)
          -top_five
          -
          # A tibble: 5 x 2
          -  dest  avg_delay
          -  <chr>     <dbl>
          -1 CAE        41.8
          -2 TUL        33.7
          -3 OKC        30.6
          -4 JAC        28.1
          -5 TYS        24.1
          -

          (LC5.19) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:

          +

          (LC4.20) Using the datasets included in the nycflights13 package, compute the available seat miles for each airline sorted in descending order. After completing all the necessary data wrangling steps, the resulting data frame should have 16 rows (one for each airline) and 2 columns (airline name and available seat miles). Here are some hints:

          1. Crucial: Unless you are very confident in what you are doing, it is worthwhile to not starting coding right away, but rather first sketch out on paper all the necessary data wrangling steps not using exact code, but rather high-level pseudocode that is informal yet detailed enough to articulate what you are doing. This way you won’t confuse what you are trying to do (the algorithm) with how you are going to do it (writing dplyr code).
          2. Take a close look at all the datasets using the View() function: flights, weather, planes, airports, and airlines to identify which variables are necessary to compute available seat miles.
          3. @@ -2174,185 +937,77 @@

            D.4 Chapter 5 Solutions

          4. Consider the data wrangling verbs in Table 4.1 as your toolbox!

          Solution: Here are some examples of student-written pseudocode. Based on our own pseudocode, let’s first display the entire solution.

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  arrange(desc(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 UA      15516377526
          - 2 DL      10532885801
          - 3 B6       9618222135
          - 4 AA       3677292231
          - 5 US       2533505829
          - 6 VX       2296680778
          - 7 EV       1817236275
          - 8 WN       1718116857
          - 9 9E        776970310
          -10 HA        642478122
          -11 AS        314104736
          -12 FL        219628520
          -13 F9        184832280
          -14 YV         20163632
          -15 MQ          7162420
          -16 OO          1299835

          Let’s now break this down step-by-step. To compute the available seat miles for a given flight, we need the distance variable from the flights data frame and the seats variable from the planes data frame, necessitating a join by the key variable tailnum as illustrated in Figure 4.7. To keep the resulting data frame easy to view, we’ll select() only these two variables and carrier:

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance)
          -
          # A tibble: 284,170 x 3
          -   carrier seats distance
          -   <chr>   <int>    <dbl>
          - 1 UA        149     1400
          - 2 UA        149     1416
          - 3 AA        178     1089
          - 4 B6        200     1576
          - 5 DL        178      762
          - 6 UA        191      719
          - 7 B6        200     1065
          - 8 EV         55      229
          - 9 B6        200      944
          -10 B6        200     1028
          -# … with 284,160 more rows

          Now for each flight we can compute the available seat miles ASM by multiplying the number of seats by the distance via a mutate():

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  # Added:
          -  mutate(ASM = seats * distance)
          -
          # A tibble: 284,170 x 4
          -   carrier seats distance    ASM
          -   <chr>   <int>    <dbl>  <dbl>
          - 1 UA        149     1400 208600
          - 2 UA        149     1416 210984
          - 3 AA        178     1089 193842
          - 4 B6        200     1576 315200
          - 5 DL        178      762 135636
          - 6 UA        191      719 137329
          - 7 B6        200     1065 213000
          - 8 EV         55      229  12595
          - 9 B6        200      944 188800
          -10 B6        200     1028 205600
          -# … with 284,160 more rows

          Next we want to sum the ASM for each carrier. We achieve this by first grouping by carrier and then summarizing using the sum() function:

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  # Added:
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 9E        776970310
          - 2 AA       3677292231
          - 3 AS        314104736
          - 4 B6       9618222135
          - 5 DL      10532885801
          - 6 EV       1817236275
          - 7 F9        184832280
          - 8 FL        219628520
          - 9 HA        642478122
          -10 MQ          7162420
          -11 OO          1299835
          -12 UA      15516377526
          -13 US       2533505829
          -14 VX       2296680778
          -15 WN       1718116857
          -16 YV         20163632
          -

          However, because for certain carriers certain flights have missing NA values, the resulting table also returns NA’s. We can eliminate these by adding a na.rm = TRUE argument to sum(), telling R that we want to remove the NA’s in the sum. We saw this in Section (summarize):

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  # Modified:
          -  summarize(ASM = sum(ASM, na.rm = TRUE))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 9E        776970310
          - 2 AA       3677292231
          - 3 AS        314104736
          - 4 B6       9618222135
          - 5 DL      10532885801
          - 6 EV       1817236275
          - 7 F9        184832280
          - 8 FL        219628520
          - 9 HA        642478122
          -10 MQ          7162420
          -11 OO          1299835
          -12 UA      15516377526
          -13 US       2533505829
          -14 VX       2296680778
          -15 WN       1718116857
          -16 YV         20163632
          +

          However, because for certain carriers certain flights have missing NA values, the resulting table also returns NA’s. We can eliminate these by adding a na.rm = TRUE argument to sum(), telling R that we want to remove the NA’s in the sum. We saw this in Section 4.3:

          Finally, we arrange() the data in desc()ending order of ASM.

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  # Added:
          -  arrange(desc(ASM))
          -
          # A tibble: 16 x 2
          -   carrier         ASM
          -   <chr>         <dbl>
          - 1 UA      15516377526
          - 2 DL      10532885801
          - 3 B6       9618222135
          - 4 AA       3677292231
          - 5 US       2533505829
          - 6 VX       2296680778
          - 7 EV       1817236275
          - 8 WN       1718116857
          - 9 9E        776970310
          -10 HA        642478122
          -11 AS        314104736
          -12 FL        219628520
          -13 F9        184832280
          -14 YV         20163632
          -15 MQ          7162420
          -16 OO          1299835

          While the above data frame is correct, the IATA carrier code is not always useful. For example, what carrier is WN? We can address this by joining with the airlines dataset using carrier is the key variable. While this step is not absolutely required, it goes a long way to making the table easier to make sense of. It is important to be empathetic with the ultimate consumers of your presented data!

          -
          flights %>% 
          -  inner_join(planes, by = "tailnum") %>% 
          -  select(carrier, seats, distance) %>% 
          -  mutate(ASM = seats * distance) %>% 
          -  group_by(carrier) %>% 
          -  summarize(ASM = sum(ASM, na.rm = TRUE)) %>% 
          -  arrange(desc(ASM)) %>% 
          -  # Added:
          -  inner_join(airlines, by = "carrier")
          -
          # A tibble: 16 x 3
          -   carrier         ASM name                       
          -   <chr>         <dbl> <chr>                      
          - 1 UA      15516377526 United Air Lines Inc.      
          - 2 DL      10532885801 Delta Air Lines Inc.       
          - 3 B6       9618222135 JetBlue Airways            
          - 4 AA       3677292231 American Airlines Inc.     
          - 5 US       2533505829 US Airways Inc.            
          - 6 VX       2296680778 Virgin America             
          - 7 EV       1817236275 ExpressJet Airlines Inc.   
          - 8 WN       1718116857 Southwest Airlines Co.     
          - 9 9E        776970310 Endeavor Air Inc.          
          -10 HA        642478122 Hawaiian Airlines Inc.     
          -11 AS        314104736 Alaska Airlines Inc.       
          -12 FL        219628520 AirTran Airways Corporation
          -13 F9        184832280 Frontier Airlines Inc.     
          -14 YV         20163632 Mesa Airlines Inc.         
          -15 MQ          7162420 Envoy Air                  
          -16 OO          1299835 SkyWest Airlines Inc.      
          +
          +
          +
          +

          D.4 Chapter 5 Solutions

          +
          library(dplyr)
          +library(ggplot2)
          +library(nycflights13)
          +library(tidyr)
          +library(readr)
          +

          (LC5.1) What are common characteristics of “tidy” datasets?

          +

          Solution: Rows correspond to observations, while columns correspond to variables.

          +

          (LC5.2) What makes “tidy” datasets useful for organizing data?

          +

          Solution: Tidy datasets are an organized way of viewing data. This format is required for the ggplot2 and dplyr packages for data visualization and wrangling.

          +

          (LC5.3) Take a look the airline_safety data frame included in the fivethirtyeight data. Run the following:

          +
          airline_safety
          +

          After reading the help file by running ?airline_safety, we see that airline_safety is a data frame containing information on different airlines companies’ safety records. This data was originally reported on the data journalism website FiveThirtyEight.com in Nate Silver’s article “Should Travelers Avoid Flying Airlines That Have Had Crashes in the Past?”. Let’s ignore the incl_reg_subsidiaries and avail_seat_km_per_week variables for simplicity:

          +
          airline_safety_smaller <- airline_safety %>% 
          +  select(-c(incl_reg_subsidiaries, avail_seat_km_per_week))
          +airline_safety_smaller
          +
          # A tibble: 56 x 7
          +   airline incidents_85_99 fatal_accidents… fatalities_85_99 incidents_00_14
          +   <chr>             <int>            <int>            <int>           <int>
          + 1 Aer Li…               2                0                0               0
          + 2 Aerofl…              76               14              128               6
          + 3 Aeroli…               6                0                0               1
          + 4 Aerome…               3                1               64               5
          + 5 Air Ca…               2                0                0               2
          + 6 Air Fr…              14                4               79               6
          + 7 Air In…               2                1              329               4
          + 8 Air Ne…               3                0                0               5
          + 9 Alaska…               5                0                0               5
          +10 Alital…               7                2               50               4
          +# … with 46 more rows, and 2 more variables: fatal_accidents_00_14 <int>,
          +#   fatalities_00_14 <int>
          +

          This data frame is not in “tidy” format. How would you convert this data frame to be in “tidy” format, in particular so that it has a variable incident_type_years indicating the indicent type/year and a variable count of the counts?

          +

          Solution: Using the gather() function from the tidyr package:

          +
          airline_safety_smaller_tidy <- airline_safety_smaller %>% 
          +  gather(key = incident_type_years, value = count, -airline)
          +airline_safety_smaller_tidy
          +
          # A tibble: 336 x 3
          +   airline               incident_type_years count
          +   <chr>                 <chr>               <int>
          + 1 Aer Lingus            incidents_85_99         2
          + 2 Aeroflot              incidents_85_99        76
          + 3 Aerolineas Argentinas incidents_85_99         6
          + 4 Aeromexico            incidents_85_99         3
          + 5 Air Canada            incidents_85_99         2
          + 6 Air France            incidents_85_99        14
          + 7 Air India             incidents_85_99         2
          + 8 Air New Zealand       incidents_85_99         3
          + 9 Alaska Airlines       incidents_85_99         5
          +10 Alitalia              incidents_85_99         7
          +# … with 326 more rows
          +

          If you look at the resulting airline_safety_smaller_tidy data frame in the spreadsheet viewer, you’ll see that the variable incident_type_years has 6 possible values: "incidents_85_99", "fatal_accidents_85_99", "fatalities_85_99", "incidents_00_14", "fatal_accidents_00_14", "fatalities_00_14" corresponding to the 6 columns of airline_safety_smaller we tidied.

          +

          (LC5.4) Convert the dem_score data frame into a tidy data frame and assign the name of dem_score_tidy to the resulting long-formatted data frame.

          +

          Solution: Running the following in the console:

          +

          Let’s now compare the dem_score and dem_score_tidy. dem_score has democracy score information for each year in columns, whereas in dem_score_tidy there are explicit variables year and democracy_score. While both representations of the data contain the same information, we can only use ggplot() to create plots using the dem_score_tidy data frame.

          +

          (LC5.5) Read in the life expectancy data stored at https://moderndive.com/data/le_mess.csv and convert it to a tidy data frame.

          +

          Solution: The code is similar

          +

          We observe the same construct structure with respect to year in life_expectancy vs life_expectancy_tidy as we did in dem_score vs dem_score_tidy:


          D.5 Chapter 6 Solutions

          +

          To come!

          library(ggplot2)
           library(dplyr)
           library(moderndive)
          diff --git a/docs/images/accuracy_vs_precision.jpg b/docs/images/accuracy_vs_precision.jpg
          new file mode 100644
          index 000000000..8c5c7d131
          Binary files /dev/null and b/docs/images/accuracy_vs_precision.jpg differ
          diff --git a/docs/images/accuracy_vs_precision.png b/docs/images/accuracy_vs_precision.png
          new file mode 100644
          index 000000000..0c1edcafa
          Binary files /dev/null and b/docs/images/accuracy_vs_precision.png differ
          diff --git a/docs/images/crash-test-dummy.jpg b/docs/images/crash-test-dummy.jpg
          new file mode 100644
          index 000000000..3364e6598
          Binary files /dev/null and b/docs/images/crash-test-dummy.jpg differ
          diff --git a/docs/images/crc_press.jpg b/docs/images/crc_press.jpg
          new file mode 100644
          index 000000000..c7a78c667
          Binary files /dev/null and b/docs/images/crc_press.jpg differ
          diff --git a/docs/images/flight-simulator.jpg b/docs/images/flight-simulator.jpg
          new file mode 100644
          index 000000000..7a5fe9df4
          Binary files /dev/null and b/docs/images/flight-simulator.jpg differ
          diff --git a/docs/images/import-cheatsheet-1.png b/docs/images/import_cheatsheet-1.png
          similarity index 100%
          rename from docs/images/import-cheatsheet-1.png
          rename to docs/images/import_cheatsheet-1.png
          diff --git a/docs/images/import-cheatsheet-2.png b/docs/images/import_cheatsheet-2.png
          similarity index 100%
          rename from docs/images/import-cheatsheet-2.png
          rename to docs/images/import_cheatsheet-2.png
          diff --git a/docs/index.html b/docs/index.html
          index d7b0cea89..012644159 100644
          --- a/docs/index.html
          +++ b/docs/index.html
          @@ -6,20 +6,20 @@
             
             
             Statistical Inference via Data Science
          -  
          +  
             
           
             
             
             
             
          -  
          +  
             
           
             
             
             
          -  
          +  
             
           
           
          @@ -214,9 +214,10 @@
           
        38. 4.5 mutate existing variables
        39. 4.6 arrange and sort rows
        40. 4.7 join data frames
        41. 4.8 Other verbs
        42. +
        43. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular:
          • Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.
          • Ch.9: Building confidence intervals.
          • Ch.10: Conducting hypothesis tests.
          • -
          -
            -
          1. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 & Ch.7. In particular:
          2. -
          +
        44. +
        45. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 & Ch.7. In particular:
          • Ch.11: Interpreting both the statistical and practice significance of the results of the models.
          • -
          +
        46. +

          We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle.

          ModernDive Flowchart @@ -696,48 +697,38 @@

          1.2.1 Who is this book for?

          This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.

          Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.

            -
          1. Blur the lines between lecture and lab
          2. -
          +
        47. Blur the lines between lecture and lab
          • With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
          • It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
          • -
          -
            -
          1. Focus on the entire data/science research pipeline
          2. -
          +
        48. +
        49. Focus on the entire data/science research pipeline -
            -
          1. It’s all about the data
          2. -
          +
        50. +
        51. It’s all about the data
          • We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages.
          • We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic.
          • dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored.
          • -
          -
            -
          1. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas
          2. -
          +
        52. +
        53. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas
          • Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
          • This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
          • -
          -
            -
          1. Don’t fence off students from the computation pool, throw them in!
          2. -
          +
        54. +
        55. Don’t fence off students from the computation pool, throw them in!
          • Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
          • We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
          • -
          -
            -
          1. Complete reproducibility and customizability
          2. -
          +
        56. +
        57. Complete reproducibility and customizability
          • We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
          • Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book.
          • -
          +
        58. +
          @@ -822,17 +813,24 @@

          1.4 Connect and contribute

          1.5 About this book

          This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:

          Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.

          Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!”

          @@ -857,16 +855,20 @@

          1.6 About the authors

          Here, with us assuming the two population means are equal (\(H_0: \mu_r - \mu_a = 0\)), we can look at this from a tactile point of view by using index cards. There are \(n_r = 34\) data elements corresponding to romance movies and \(n_a = 34\) for action movies. We can write the 34 ratings from our sample for romance movies on one set of 34 index cards and the 34 ratings for action movies on another set of 34 index cards. (Note that the sample sizes need not be the same.)

          -

          The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. Note that the new “romance movie stack” will likely have some of the original action movies in it and likewise for the “action movie stack” including some romance movies from our original set. Since we are assuming that each card is equally likely to have appeared in either one of the stacks this makes sense. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.

          +

          The next step is to put the two stacks of index cards together, creating a new set of 68 cards. If we assume that the two population means are equal, we are saying that there is no association between ratings and genre (romance vs action). We can use the index cards to create two new stacks for romance and action movies. First, we must shuffle all the cards thoroughly. After doing so, in this case with equal values of sample sizes, we split the deck in half.

          We then calculate the new sample mean rating of the romance deck, and also the new sample mean rating of the action deck. This creates one simulation of the samples that were collected originally. We next want to calculate a statistic from these two samples. Instead of actually doing the calculation using index cards, we can use R as we have before to simulate this process. Let’s do this just once and compare the results to what we see in movies_genre_sample.

          -
          movies_genre_sample %>% 
          -  specify(formula = rating ~ genre) %>%
          -  hypothesize(null = "independence") %>% 
          -  generate(reps = 1) %>% 
          -  calculate(stat = "diff in means", order = c("Romance", "Action"))
          -
          # A tibble: 1 x 1
          -   stat
          -  <dbl>
          -1 0.515
          +
          shuffled_ratings_old <- #movies_trimmed %>%
          +  movies_genre_sample %>% 
          +     mutate(genre = mosaic::shuffle(genre)) %>% 
          +     group_by(genre) %>%
          +     summarize(mean = mean(rating))
          +diff(shuffled_ratings_old$mean)
          +
          [1] 0.126
          +
          permuted_ratings <- movies_genre_sample %>% 
          +  specify(formula = rating ~ genre) %>% 
          +  generate(reps = 1)

          Learning check @@ -951,7 +930,7 @@

          11.7.8 Simulated data

          -

          11.7.9 Distribution of \(\delta\) under \(H_0\)

          +

          10.7.9 Distribution of \(\delta\) under \(H_0\)

          The generate() step completes a permutation sending values of ratings to potentially different values of genre from which they originally came. It simulates a shuffling of the ratings between the two levels of genre just as we could have done with index cards. We can now proceed in a similar way to what we have done previously with bootstrapping by repeating this process many times to create simulated samples, assuming the null hypothesis is true.

          generated_samples <- movies_genre_sample %>% 
             specify(formula = rating ~ genre) %>% 
          @@ -960,31 +939,31 @@ 

          11.7.9 Distribution of A null distribution of simulated differences in sample means is created with the specification of stat = "diff in means" for the calculate() step. The null distribution is similar to the bootstrap distribution we saw in Chapter 9, but remember that it consists of statistics generated assuming the null hypothesis is true.

          We can now plot the distribution of these simulated differences in means:

          null_distribution_two_means %>% visualize()
          -
          -Simulated differences in means histogram +
          +Simulated differences in means histogram

          -FIGURE 11.7: Simulated differences in means histogram +Figure 10.7: Simulated differences in means histogram

          -

          11.7.10 The p-value

          +

          10.7.10 The p-value

          Remember that we are interested in seeing where our observed sample mean difference of 0.95 falls on this null/randomization distribution. We are interested in simply a difference here so “more extreme” corresponds to values in both tails on the distribution. Let’s shade our null distribution to show a visual representation of our \(p\)-value:

          null_distribution_two_means %>% 
             visualize(obs_stat = obs_diff, direction = "both")
          -
          -Shaded histogram to show p-value +
          +Shaded histogram to show p-value

          -FIGURE 11.8: Shaded histogram to show p-value +Figure 10.8: Shaded histogram to show p-value

          Remember that the observed difference in means was 0.95. We have shaded red all values at or above that value and also shaded red those values at or below its negative value (since this is a two-tailed test). By giving obs_stat = obs_diff a vertical darker line is also shown at 0.95. To better estimate how large the \(p\)-value will be, we also increase the number of bins to 100 here from 20:

          null_distribution_two_means %>% 
             visualize(bins = 100, obs_stat = obs_diff, direction = "both")
          -
          -Histogram with vertical lines corresponding to observed statistic +
          +Histogram with vertical lines corresponding to observed statistic

          -FIGURE 11.9: Histogram with vertical lines corresponding to observed statistic +Figure 10.9: Histogram with vertical lines corresponding to observed statistic

          At this point, it is important to take a guess as to what the \(p\)-value may be. We can see that there are only a few permuted differences as extreme or more extreme than our observed effect (in both directions). Maybe we guess that this \(p\)-value is somewhere around 2%, or maybe 3%, but certainly not 30% or more. Lastly, we calculate the \(p\)-value directly using infer:

          @@ -994,11 +973,11 @@

          11.7.10 The p-value

          # A tibble: 1 x 1
             p_value
               <dbl>
          -1   0.006
          -

          We have around 0.6% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \(\alpha\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.

          +1 0.0046

          +

          We have around 0.46% of values as extreme or more extreme than our observed statistic in both directions. Assuming we are using a 5% significance level for \(\alpha\), we have evidence supporting the conclusion that the mean rating for romance movies is different from that of action movies. The next important idea is to better understand just how much higher of a mean rating can we expect the romance movies to have compared to that of action movies.

          -

          11.7.11 Corresponding confidence interval

          +

          10.7.11 Corresponding confidence interval

          One of the great things about the infer pipeline is that going between hypothesis tests and confidence intervals is incredibly simple. To create a null distribution, we ran

          null_distribution_two_means <- movies_genre_sample %>% 
             specify(formula = rating ~ genre) %>% 
          @@ -1035,7 +1014,7 @@ 

          11.7.11 Corresponding confidence

          -

          11.7.12 Summary

          +

          10.7.12 Summary

          To review, these are the steps one would take whenever you’d like to do a hypothesis test comparing values from the distributions of two groups:

            @@ -1054,13 +1033,13 @@

            11.7.12 Summary

          -

          11.8 Building theory-based methods using computation

          +

          10.8 Building theory-based methods using computation

          As a point of reference, we will now discuss the traditional theory-based way to conduct the hypothesis test for determining if there is a statistically significant difference in the sample mean rating of Action movies versus Romance movies. This method and ones like it work very well when the assumptions are met in order to run the test. They are based on probability models and distributions such as the normal and \(t\)-distributions.

          These traditional methods have been used for many decades back to the time when researchers didn’t have access to computers that could run 5000 simulations in a few seconds. They had to base their methods on probability theory instead. Many fields and researchers continue to use these methods and that is the biggest reason for their inclusion here. It’s important to remember that a \(t\)-test or a \(z\)-test is really just an approximation of what you have seen in this chapter already using simulation and randomization. The focus here is on understanding how the shape of the \(t\)-curve comes about without digging big into the mathematical underpinnings.

          -

          11.8.1 Example: \(t\)-test for two independent samples

          +

          10.8.1 Example: \(t\)-test for two independent samples

          What is commonly done in statistics is the process of normalization. What this entails is calculating the mean and standard deviation of a variable. Then you subtract the mean from each value of your variable and divide by the standard deviation. The most common normalization is known as the \(z\)-score. The formula for a \(z\)-score is \[Z = \frac{x - \mu}{\sigma},\] where \(x\) represent the value of a variable, \(\mu\) represents the mean of the variable, and \(\sigma\) represents the standard deviation of the variable. Thus, if your variable has 10 elements, each one has a corresponding \(z\)-score that gives how many standard deviations away that value is from its mean. \(z\)-scores are normally distributed with mean 0 and standard deviation 1. They have the common, bell-shaped pattern seen below.

          -

          +

          Recall, that we hardly ever know the mean and standard deviation of the population of interest. This is almost always the case when considering the means of two independent groups. To help account for us not knowing the population parameter values, we can use the sample statistics instead, but this comes with a bit of a price in terms of complexity.

          Another form of normalization occurs when we need to use the sample standard deviations as estimates for the unknown population standard deviations. This normalization is often called the \(t\)-score. For the two independent samples case like what we have for comparing action movies to romance movies, the formula is \[T =\dfrac{ (\bar{x}_1 - \bar{x}_2) - (\mu_1 - \mu_2)}{ \sqrt{\dfrac{{s_1}^2}{n_1} + \dfrac{{s_2}^2}{n_2}} }\]

          There is a lot to try to unpack here.

          @@ -1080,10 +1059,10 @@

          11.8.1 Example: \(\delta = \bar{x}_1 - \bar{x}_2\) looks like using randomization above. Recall this distribution:

          ggplot(data = null_distribution_two_means, aes(x = stat)) +
             geom_histogram(color = "white", bins = 20)
          -
          -Simulated differences in means histogram +
          +Simulated differences in means histogram

          -FIGURE 11.10: Simulated differences in means histogram +Figure 10.10: Simulated differences in means histogram

          The infer package also includes some built-in theory-based statistics as well, so instead of going through the process of trying to transform the difference into a standardized form, we can just provide a different value for stat in calculate(). Recall the generated_samples data frame created via:

          @@ -1095,12 +1074,12 @@

          11.8.1 Example: null_distribution_t <- generated_samples %>% calculate(stat = "t", order = c("Romance", "Action")) null_distribution_t %>% visualize() -

          +

          We see that the shape of this stat = "t" distribution is the same as that of stat = "diff in means". The scale has changed though with the \(t\) values having less spread than the difference in means.

          A traditional \(t\)-test doesn’t look at this simulated distribution, but instead it looks at the \(t\)-curve with degrees of freedom equal to 62.029. We can overlay this distribution over the top of our permuted \(t\) statistics using the method = "both" setting in visualize().

          null_distribution_t %>% 
             visualize(method = "both")
          -

          +

          We can see that the curve does a good job of approximating the randomization distribution here. (More on when to expect for this to be the case when we discuss conditions for the \(t\)-test in a bit.) To calculate the \(p\)-value in this case, we need to figure out how much of the total area under the \(t\)-curve is at our observed \(T\)-statistic or more, plus also adding the area under the curve at the negative value of the observed \(T\)-statistic or below. (Remember this is a two-tailed test so we are looking for a difference–values in the tails of either direction.) Just as we converted all of the simulated values to \(T\)-statistics, we must also do so for our observed effect \(\delta^*\):

          obs_t <- movies_genre_sample %>% 
             specify(formula = rating ~ genre) %>% 
          @@ -1108,11 +1087,11 @@ 

          11.8.1 Example: null_distribution_t %>% visualize(method = "both", obs_stat = obs_t, direction = "both")

          -

          +

          As we might have expected with this just being a standardization of the difference in means statistic that produced a small \(p\)-value, we also have a very small one here.

          -

          11.8.2 Conditions for t-test

          +

          10.8.2 Conditions for t-test

          The infer package does not automatically check conditions for the theoretical methods to work and this warning was given when we used method = "both". In order for the results of the \(t\)-test to be valid, three conditions must be met:

          1. Independent observations in both samples
          2. @@ -1120,36 +1099,29 @@

            11.8.2 Conditions for t-test

          3. Independently selected samples

          Condition 1: This is met since we sampled at random using R from our population.

          -

          Condition 2: Recall from Figure 11.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\(n_1 = n_2 = 34\)).

          +

          Condition 2: Recall from Figure 10.4, that we know how the populations are distributed. Both of them are close to normally distributed. If we are a little concerned about this assumption, we also do have samples of size larger than 30 (\(n_1 = n_2 = 34\)).

          Condition 3: This is met since there is no natural pairing of a movie in the Action group to a movie in the Romance group.

          Since all three conditions are met, we can be reasonably certain that the theory-based test will match the results of the randomization-based test using shuffling. Remember that theory-based tests can produce some incorrect results in these assumptions are not carefully checked. The only assumption for randomization and computational-based methods is that the sample is selected at random. They are our preference and we strongly believe they should be yours as well, but it’s important to also see how the theory-based tests can be done and used as an approximation for the computational techniques until at least more researchers are using these techniques that utilize the power of computers.

          -

          -
          -

          11.9 Conclusion

          -

          We conclude by showing the infer pipeline diagram. In Chapter 12, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling.

          +
          +

          10.9 Conclusion

          +

          We conclude by showing the infer pipeline diagram. In Chapter 11, we’ll come back to regression and see how the ideas covered in Chapter 9 and this chapter can help in understanding the significance of predictors in modeling.

          -
          -

          11.9.1 Script of R code

          -

          An R script file of all R code used in this chapter is available here.

          +
          +

          10.9.1 Script of R code

          +

          An R script file of all R code used in this chapter is available here.

          -
          -

          References

          -
          -
          -

          Wickham, Hadley. 2015. Ggplot2movies: Movies Data. https://CRAN.R-project.org/package=ggplot2movies.

          -
          - - + + diff --git a/docs/previous_versions/v0.4.0/11-inference-for-regression.html b/docs/previous_versions/v0.4.0/11-inference-for-regression.html new file mode 100644 index 000000000..1b98665d0 --- /dev/null +++ b/docs/previous_versions/v0.4.0/11-inference-for-regression.html @@ -0,0 +1,914 @@ + + + + + + + + 11 Inference for Regression | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
          + +
          + +
          + +
          +
          + + +
          +
          + +
          + +ModernDive + +
          +

          11 Inference for Regression

          +
          +
          +

          +Note: This chapter is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. +

          +
          +Drawing +
          +
          +
          +
          +

          Needed packages

          +

          Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages.

          +
          library(ggplot2)
          +library(dplyr)
          +library(moderndive)
          +library(infer)
          +
          +
          +

          DataCamp

          +

          Our approach of understanding both the statistical and practical significance of any regression results, is aligned with the approach taken in Jo Hardin’s DataCamp course “Inference for Regression.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course.

          +
          +Drawing +
          +
          +
          +

          11.1 Simulation-based Inference for Regression

          +

          We can also use the concept of permuting to determine the standard error of our null distribution and conduct a hypothesis test for a population slope. Let’s go back to our example on teacher evaluations Chapters 6 and 7. We’ll begin in the basic regression setting to test to see if we have evidence that a statistically significant positive relationship exists between teaching and beauty scores for the University of Texas professors. As we did in Chapter 6, teaching score will act as our outcome variable and bty_avg will be our explanatory variable. We will set up this hypothesis testing process as we have each before via the “There is Only One Test” diagram in Figure 10.1 using the infer package.

          +
          +

          11.1.1 Data

          +

          Our data is stored in evals and we are focused on the measurements of the score and bty_avg variables there. Note that we don’t choose a subset of variables here since we will specify() the variables of interest using infer.

          +
          evals %>% 
          +  specify(score ~ bty_avg)
          +
          Response: score (numeric)
          +Explanatory: bty_avg (numeric)
          +# A tibble: 463 x 2
          +   score bty_avg
          +   <dbl>   <dbl>
          + 1   4.7    5   
          + 2   4.1    5   
          + 3   3.9    5   
          + 4   4.8    5   
          + 5   4.6    3   
          + 6   4.3    3   
          + 7   2.8    3   
          + 8   4.1    3.33
          + 9   3.4    3.33
          +10   4.5    3.17
          +# … with 453 more rows
          +
          +
          +

          11.1.2 Test statistic \(\delta\)

          +

          Our test statistic here is the sample slope coefficient that we denote with \(b_1\).

          +
          +
          +

          11.1.3 Observed effect \(\delta^*\)

          +

          We can use the specify() %>% calculate() shortcut here to determine the slope value seen in our observed data:

          +
          slope_obs <- evals %>% 
          +  specify(score ~ bty_avg) %>% 
          +  calculate(stat = "slope")
          +

          The calculated slope value from our observed sample is \(b_1 = 0.067\).

          +
          +
          +

          11.1.4 Model of \(H_0\)

          +

          We are looking to see if a positive relationship exists so \(H_A: \beta_1 > 0\). Our null hypothesis is always in terms of equality so we have \(H_0: \beta_1 = 0\). In other words, when we assume the null hypothesis is true, we are assuming there is NOT a linear relationship between teaching and beauty scores for University of Texas professors.

          +
          +
          +

          11.1.5 Simulated data

          +

          Now to simulate the null hypothesis being true and recreating how our sample was created, we need to think about what it means for \(\beta_1\) to be zero. If \(\beta_1 = 0\), we said above that there is no relationship between the teaching and beauty scores. If there is no relationship, then any one of the teaching score values could have just as likely occurred with any of the other beauty score values instead of the one that it actually did fall with. We, therefore, have another example of permuting in our simulating of data under the null hypothesis.

          +

          Tactile simulation

          +

          We could use a deck of 926 note cards to create a tactile simulation of this permuting process. We would write the 463 different values of beauty scores on each of the 463 cards, one per card. We would then do the same thing for the 463 teaching scores putting them on one per card.

          +

          Next, we would lay out each of the 463 beauty score cards and we would shuffle the teaching score deck. Then, after shuffling the deck well, we would disperse the cards one per each one of the beauty score cards. We would then enter these new values in for teaching score and compute a sample slope based on this permuting. We could repeat this process many times, keeping track of our sample slope after each shuffle.

          +
          +
          +

          11.1.6 Distribution of \(\delta\) under \(H_0\)

          +

          We can build our null distribution in much the same way we did in Chapter 10 using the generate() and calculate() functions. Note also the addition of the hypothesize() function, which lets generate() know to perform the permuting instead of bootstrapping.

          +
          null_slope_distn <- evals %>% 
          +  specify(score ~ bty_avg) %>%
          +  hypothesize(null = "independence") %>% 
          +  generate(reps = 10000) %>% 
          +  calculate(stat = "slope")
          +
          null_slope_distn %>% 
          +  visualize(obs_stat = slope_obs, direction = "greater")
          +

          +

          In viewing the distribution above with shading to the right of our observed slope value of 0.067, we can see that we expect the p-value to be quite small. Let’s calculate it next using a similar syntax to what was done with visualize().

          +
          +
          +

          11.1.7 The p-value

          +
          null_slope_distn %>% 
          +  get_pvalue(obs_stat = slope_obs, direction = "greater")
          +
          # A tibble: 1 x 1
          +  p_value
          +    <dbl>
          +1       0
          +

          Since 0.067 falls far to the right of this plot beyond where any of the histogram bins have data, we can say that we have a \(p\)-value of 0. We, thus, have evidence to reject the null hypothesis in support of there being a positive association between the beauty score and teaching score of University of Texas faculty members.

          +
          +

          +Learning check +

          +
          +

          (LC11.1) Repeat the inference above but this time for the correlation coefficient instead of the slope. Note the implementation of stat = "correlation" in the calculate() function of the infer package.

          +
          + +
          +
          +
          +
          +

          11.2 Bootstrapping for the regression slope

          +

          With the p-value calculated as 0 in the hypothesis test above, we can next determine just how strong of a positive slope value we might expect between the variables of teaching score and beauty score (bty_avg) for University of Texas faculty. Recall the infer pipeline above to compute the null distribution. Recall that this assumes the null hypothesis is true that there is no relationship between teaching score and beauty score using the hypothesize() function.

          +
          null_slope_distn <- evals %>% 
          +  specify(score ~ bty_avg) %>%
          +  hypothesize(null = "independence") %>% 
          +  generate(reps = 10000, type = "permute") %>% 
          +  calculate(stat = "slope")
          +

          To further reinforce the process being done in the pipeline, we’ve added the type argument to generate(). This is automatically added based on the entries for specify() and hypothesize() but it provides a useful way to check to make sure generate() is created the samples in the desired way. In this case, we permuted the values of one variable across the values of the other 10,000 times and calculated a "slope" coefficient for each of these 10,000 generated samples.

          +

          If instead we’d like to get a range of plausible values for the true slope value, we can use the process of bootstrapping:

          +
          bootstrap_slope_distn %>% visualize()
          +

          +

          Next we can use the get_ci() function to determine the confidence interval. Let’s do this in two different ways obtaining 99% confidence intervals. Remember that these denote a range of plausible values for an unknown true population slope parameter regressing teaching score on beauty score.

          +
          percentile_slope_ci <- bootstrap_slope_distn %>% 
          +  get_ci(level = 0.99, type = "percentile")
          +percentile_slope_ci
          +
          # A tibble: 1 x 2
          +  `0.5%` `99.5%`
          +   <dbl>   <dbl>
          +1 0.0229   0.110
          +
          se_slope_ci <- bootstrap_slope_distn %>% 
          +  get_ci(level = 0.99, type = "se", point_estimate = slope_obs)
          +se_slope_ci
          +
          # A tibble: 1 x 2
          +   lower upper
          +   <dbl> <dbl>
          +1 0.0220 0.111
          +

          With the bootstrap distribution being close to symmetric, it makes sense that the two resulting confidence intervals are similar.

          + +
          +
          +

          11.3 Inference for multiple regression

          +
          +

          11.3.1 Refresher: Professor evaluations data

          +

          Let’s revisit the professor evaluations data that we analyzed using multiple regression with one numerical and one categorical predictor. In particular

          +
            +
          • \(y\): outcome variable of instructor evaluation score
          • +
          • predictor variables +
              +
            • \(x_1\): numerical explanatory/predictor variable of age
            • +
            • \(x_2\): categorical explanatory/predictor variable of gender
            • +
          • +
          +
          library(ggplot2)
          +library(dplyr)
          +library(moderndive)
          +
          +evals_multiple <- evals %>%
          +  select(score, ethnicity, gender, language, age, bty_avg, rank)
          +

          First, recall that we had two competing potential models to explain professors’ +teaching scores:

          +
            +
          1. Model 1: No interaction term. i.e. both male and female profs have the same slope describing the associated effect of age on teaching score
          2. +
          3. Model 2: Includes an interaction term. i.e. we allow for male and female profs to have different slopes describing the associated effect of age on teaching score
          4. +
          +
          +
          +

          11.3.2 Refresher: Visualizations

          +

          Recall the plots we made for both these models:

          +
          +Model 1: no interaction effect included +

          +Figure 11.1: Model 1: no interaction effect included +

          +
          +
          +Model 2: interaction effect included +

          +Figure 11.2: Model 2: interaction effect included +

          +
          +
          +
          +

          11.3.3 Refresher: Regression tables

          +

          Last, let’s recall the regressions we fit. First, the regression with no +interaction effect: note the use of + in the formula.

          +
          score_model_2 <- lm(score ~ age + gender, data = evals_multiple)
          +get_regression_table(score_model_2)
          + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
          Table 11.1: Model 1: Regression table with no interaction effect included
          termestimatestd_errorstatisticp_valuelower_ciupper_ci
          intercept4.4840.12535.790.0004.2384.730
          age-0.0090.003-3.280.001-0.014-0.003
          gendermale0.1910.0523.630.0000.0870.294
          +

          Second, the regression with an interaction effect: note the use of * in the formula.

          +
          score_model_3 <- lm(score ~ age * gender, data = evals_multiple)
          +get_regression_table(score_model_3)
          + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
          Table 11.2: Model 2: Regression table with interaction effect included
          termestimatestd_errorstatisticp_valuelower_ciupper_ci
          intercept4.8830.20523.800.0004.4805.286
          age-0.0180.004-3.920.000-0.026-0.009
          gendermale-0.4460.265-1.680.094-0.9680.076
          age:gendermale0.0140.0062.450.0150.0030.024
          +
          +
          +

          11.3.4 Script of R code

          +

          An R script file of all R code used in this chapter is available here.

          + +
          +
          +
          + + + +
          + +
          +
          +
          + + +
          +
          + + + + + + + + + + + + + + diff --git a/docs/13-thinking-with-data.html b/docs/previous_versions/v0.4.0/12-thinking-with-data.html similarity index 71% rename from docs/13-thinking-with-data.html rename to docs/previous_versions/v0.4.0/12-thinking-with-data.html index 56d5260ac..5e0ae1d8a 100644 --- a/docs/13-thinking-with-data.html +++ b/docs/previous_versions/v0.4.0/12-thinking-with-data.html @@ -5,11 +5,11 @@ - Chapter 13 Thinking with Data | Statistical Inference via Data Science + 12 Thinking with Data | An Introduction to Statistical and Data Sciences via R - + @@ -17,7 +17,7 @@ - + @@ -25,14 +25,14 @@ - + - + @@ -48,7 +48,6 @@ - @@ -147,54 +146,53 @@

        All this was our approach of guiding you through your first experiences of “thinking with data”, an expression originally coined by Diane Lambert of Google. How the philosophy underlying this expression guided our mapping of the flowchart above was well put in the introduction to the “Practical Data Science for Stats” collection of preprints focusing on the practical side of data science workflows and statistical analysis, curated by Jennifer Bryan and Hadley Wickham:

        @@ -603,11 +590,11 @@

        Chapter 13 Thinking with Data

        Data/Science Pipeline

        -FIGURE 13.2: Data/Science Pipeline +Figure 12.2: Data/Science Pipeline

        -

        In Section 13.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 13.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package.

        -
        +

        In Section 12.1, we’ll take you through full-pass of the “Data/Science Pipeline” where we’ll analyze the sale price of houses in Seattle, WA, USA. In Section 12.2, we’ll present you with examples of effective data storytelling, in particular the articles from the data journalism website FiveThirtyEight.com, many of whose source datasets are accessible from the fivethirtyeight R package.

        +

        Needed packages

        Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages.

        library(ggplot2)
        @@ -615,17 +602,17 @@ 

        Needed packages

        library(moderndive) library(fivethirtyeight)
        -
        +

        DataCamp

        -

        The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression.”

        +

        The case study of Seattle house prices below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.

        - +Drawing
        -

        Case studies involving data in the fivethirtyeight R package form the basis of ModernDive co-author Chester Ismay’s DataCamp course “Effective Data Storytelling in the Tidyverse.” This free course can be accessed here.

        +

        Cases studies involving data in the fivethirtyeight R package form the basis of ModernDive co-author Chester Ismay’s DataCamp course “Effective Data Storytelling in the Tidyverse”. This free course can be accessed here.


        -

        13.1 Case study: Seattle house prices

        +

        12.1 Case study: Seattle house prices

        Kaggle.com is a machine learning and predictive modeling competition website that hosts datasets uploaded by companies, governmental organizations, and other individuals. One of their datasets is the House Sales in King County, USA consisting of homes sold in between May 2014 and May 2015 in King County, Washington State, USA, which includes the greater Seattle metropolitan area. This CC0: Public Domain licensed dataset is included in the moderndive package in the house_prices data frame, which we’ll refer to as the “Seattle house prices” dataset.

        The dataset consists 21,613 houses and 21 variables describing these houses; for a full list of these variables see the help file by running ?house_prices in the console. In this case study, we’ll create a model using multiple regression where:

          @@ -641,12 +628,12 @@

          13.1 Case study: Seattle house pr library(dplyr) library(moderndive)
          -

          13.1.1 Exploratory data analysis (EDA)

          +

          12.1.1 Exploratory data analysis (EDA)

          A crucial first step before any formal modeling is an exploratory data analysis, commonly abbreviated at EDA. Exploratory data analysis can give you a sense of your data, help identify issues with your data, bring to light any outliers, and help inform model construction. There are three basic approaches to EDA:

          1. Most fundamentally, just looking at the raw data. For example using RStudio’s View() spreadsheet viewer or the glimpse() function from the dplyr package
          2. Creating visualizations like the ones using ggplot2 from Chapter 3
          3. -
          4. Computing summary statistics using the dplyr data wrangling tools from Chapter 4
          5. +
          6. Computing summary statistics using the dplyr data wrangling tools from Chapter 5

          First, let’s look the raw data using View() and the glimpse() function. Explore the dataset. Which variables are numerical and which are categorical? For the categorical variables, what are their levels? Which do you think would useful variables to use in a model for house price? In this case study, we’ll only consider the variables price, sqft_living, and condition. An important thing to observe is that while the condition variable has values 1 through 5, these are saved in R as fct factors i.e. R’s way of saving categorical variables. So you should think of these as the “labels” 1 through 5 and not the numerical values 1 through 5.

          View(house_prices)
          @@ -674,7 +661,7 @@ 

          13.1.1 Exploratory data analysis $ long <dbl> -122, -122, -122, -122, -122, -122, -122, -122, -122, -… $ sqft_living15 <int> 1340, 1690, 2720, 1360, 1800, 4760, 2238, 1650, 1780, 2… $ sqft_lot15 <int> 5650, 7639, 8062, 5000, 7503, 101930, 6819, 9711, 8113,…

          -

          Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 13.3, we display all three of these visualizations at once.

          +

          Let’s now perform the second possible approach to EDA: creating visualizations. Since price and sqft_living are numerical variables, an appropriate way to visualize of these variables’ distributions would be using a histogram using a geom_histogram() as seen in Section 3.5. However, since condition is categorical, a barplot using a geom_bar() yields an appropriate visualization of its distribution. Recall from Section 3.8 that since condition is not “pre-counted”, we use a geom_bar() and not a geom_col(). In Figure 12.3, we display all three of these visualizations at once.

          # Histogram of house price:
           ggplot(house_prices, aes(x = price)) +
             geom_histogram(color = "white") +
          @@ -692,7 +679,7 @@ 

          13.1.1 Exploratory data analysis
          Exploratory visualizations of Seattle house prices data

          -FIGURE 13.3: Exploratory visualizations of Seattle house prices data +Figure 12.3: Exploratory visualizations of Seattle house prices data

          We observe the following:

          @@ -710,7 +697,7 @@

          13.1.1 Exploratory data analysis
        • Most houses are of condition 3, 4, or 5.
        • In the case of price, why does the x-axis stretch so far to the right? It is because there are a very small number of houses with price closer to 8 million; these prices are outliers in this case. We say variable is “right skewed” as exhibited by the long right tail. This skew makes it difficult to compare prices of the less expensive houses as the more expensive houses dominate the scale of the x-axis. This is similarly the case for sqft_living.

          -

          Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 4.3.

          +

          Let’s now perform the third possible approach to EDA: computing summary statistics. In particular, let’s compute 4 summary statistics using the summarize() data wrangling verb from Section 5.4.

          • Two measures of center: the mean and median
          • Two measures of variability/spread: the standard deviation and interquartile-range (IQR = 3rd quartile - 1st quartile)
          • @@ -729,132 +716,65 @@

            13.1.1 Exploratory data analysis

            Observe the following:

            1. The mean price of $540,088 is larger than the median of $450,000. This is because the small number of very expensive outlier houses prices are inflating the average, whereas since the median is the “middle” value, it is not as sensitive to such large values at the high end. This is why the news typically reports median house prices and not average house prices when describing the real estate market. We say here that the median more “robust to outliers” than the mean.
            2. -
            3. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers.”
            4. +
            5. Similarly, while both the standard deviation and IQR are both measures of spread and variability, the IQR is more “robust to outliers”.

            If you repeat the above summarize() for sqft_living, you’ll find a similar relationship between mean vs median and standard deviation vs IQR given its similar right-skewed nature. Is there anything we can do about this right-skew? Again, this could potentially be an issue because we’ll have a harder time discriminating between houses at the lower end of price and sqft_living, which might lead to a problem when modeling.

            We can in fact address this issue by using a log base 10 transformation, which we cover next.

          -

          13.1.2 log10 transformations

          -

          At its simplest, log10() transformations returns base 10 logarithms. For example, since \(1000 = 10^3\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \(10^{3}\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table 13.1 with examples of prices of consumer goods in US dollars.

          - - +

          12.1.2 log10 transformations

          +

          At its simplest, log10() transformations returns base 10 logarithms. For example, since \(1000 = 10^3\), log10(1000) returns 3. To undo a log10-transformation, we raise 10 to this value. For example, to undo the previous log10-transformation and return the original value of 1000, we raise 10 to this value \(10^{3}\) by running 10^(3) = 1000. log-transformations allow us to focus on multiplicative changes instead of additive ones, thereby emphasizing changes in “orders of magnitude.” Let’s illustrate this idea in Table ?? with examples of prices of consumer goods in US dollars.

          +
          -TABLE 13.1: log10-transformed prices, orders of magnitude, and examples -
          - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + + - - - - - + + + + +
          -Price - -log10(Price) - -Order of magnitude - -Examples -
          Pricelog10(Price)Order of magnitudeExamples
          -$1 - -0 - -Singles - -Cups of coffee -
          $10SinglesCups of coffee
          -$10 - -1 - -Tens - -Books -
          $101TensBooks
          -$100 - -2 - -Hundreds - -Mobile phones -
          $1002HundredsMobile phones
          -$1,000 - -3 - -Thousands - -High definition TV’s -
          $1,0003ThousandsHigh definition TV’s
          -$10,000 - -4 - -Tens of thousands - -Cars -
          $10,0004Tens of thousandsCars
          -$100,000 - -5 - -Hundreds of thousands - -Luxury cars & houses -
          $100,0005Hundreds of thousandsLuxury cars & houses
          -$1,000,000 - -6 - -Millions - -Luxury houses -
          $1,000,0006MillionsLuxury houses
          @@ -865,7 +785,7 @@

          13.1.2 log10 transformations

        • log10-transformations are monotonic, meaning they preserve orderings. So if Price A is lower than Price B, then log10(Price A) will also be lower than log10(Price B).
        • Most importantly, increments of one in log10 correspond to multiplicative changes and not additive ones. For example, increasing from log10(Price) of 3 to 4 corresponds to a multiplicative increase by a factor of 10: $100 to $1000.
        • -

          Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 4.5, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name.

          +

          Let’s create new log10-transformed versions of the right-skewed variable price and sqft_living using the mutate() function from Section 5.6, but we’ll give the latter the name log10_size, which is a little more succinct and descriptive a variable name.

          house_prices <- house_prices %>%
             mutate(
               log10_price = log10(price),
          @@ -892,7 +812,7 @@ 

          13.1.2 log10 transformations

        • The house in the 6th row with price $1,225,000, which is just above one million dollars. Since \(10^6\) is one million, its log10_price is 6.09. Contrast this with all other houses with log10_price less than 6.
        • Similarly, there is only one house with size sqft_living less than 1000. Since \(1000 = 10^3\), its the lone house with log10_size less than 3.
        -

        Let’s now visualize the before and after effects of this transformation for price in Figure 13.4.

        +

        Let’s now visualize the before and after effects of this transformation for price in Figure 12.4.

        # Before:
         ggplot(house_prices, aes(x = price)) +
           geom_histogram(color = "white") +
        @@ -905,10 +825,10 @@ 

        13.1.2 log10 transformations

        House price before and after log10-transformation

        -FIGURE 13.4: House price before and after log10-transformation +Figure 12.4: House price before and after log10-transformation

        -

        Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 13.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.

        +

        Observe that after the transformation, the distribution is much less skewed, and in this case, more symmetric and bell-shaped, although this isn’t always necessarily the case. Now you can now better discriminate between house prices at the lower end of the scale. Let’s do the same for size where the before variable is sqft_living and the after variable is log10_size. Observe in Figure 12.5 that the log10-transformation has a similar effect of un-skewing the variable. Again, we emphasize that while in these two cases the resulting distributions are more symmetric and bell-shaped, this is not always necessarily the case.

        # Before:
         ggplot(house_prices, aes(x = sqft_living)) +
           geom_histogram(color = "white") +
        @@ -921,7 +841,7 @@ 

        13.1.2 log10 transformations

        House size before and after log10-transformation

        -FIGURE 13.5: House size before and after log10-transformation +Figure 12.5: House size before and after log10-transformation

        Given the now un-skewed nature of log10_price and log10_size, we are going to revise our modeling structure:

        @@ -935,16 +855,16 @@

        13.1.2 log10 transformations

      -

      13.1.3 EDA Part II

      -

      Let’s continue our exploratory data analysis from Subsection 13.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender.

      -

      We have two possible visual models. Either a parallel slopes model in Figure 13.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope:

      +

      12.1.3 EDA Part II

      +

      Let’s continue our exploratory data analysis from Subsection 12.1.1 above. The earlier EDA you performed was univariate in nature in that we only considered one variable at a time. The goal of modeling, however, is to explore relationships between variables. So we must jointly consider the relationship between the outcome variable log10_price and the explanatory/predictor variables log10_size (numerical) and condition (categorical). We viewed such a modeling scenario in Section 7.2 using the evals dataset, where the outcome variable was teaching score, the numerical explanatory/predictor variable was instructor age and the categorical explanatory/predictor variable was (binary) gender.

      +

      We have two possible visual models. Either a parallel slopes model in Figure 12.6 where we have a different regression line for each of the 5 possible condition levels, each with a different intercept but the same slope:

      Parallel slopes model

      -FIGURE 13.6: Parallel slopes model +Figure 12.6: Parallel slopes model

      -

      Or an interaction model in Figure 13.7, where we allow each regression line to not only have different intercepts, but different slopes as well:

      +

      Or an interaction model in Figure 12.7, where we allow each regression line to not only have different intercepts, but different slopes as well:

      ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
         geom_point(alpha = 0.1) +
         labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
      @@ -952,10 +872,10 @@ 

      13.1.3 EDA Part II

      Interaction model

      -FIGURE 13.7: Interaction model +Figure 12.7: Interaction model

      -

      In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plots it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 13.3 there are very few houses of condition 1 or 2. This reality is more apparent in an alternative visualization to Figure 13.7 displayed in Figure 13.8 that uses facets instead:

      +

      In both cases, we see there is a positive relationship between house price and size, meaning as houses are larger, they tend to be more expensive. Furthermore, in both plot it seems that houses of condition 5 tend to be the most expensive for most house sizes as evidenced by the fact that the purple line is highest, followed by condition 4 and 3. As for condition 1 and 2, this pattern isn’t as clear, as if you recall from the univariate barplot of condition in Figure 12.3 there are very few houses of condition 1 or 2. This ready is more apparent in an alternative visualization to Figure 12.7 displayed in Figure 12.8 that uses facets instead:

      ggplot(house_prices, aes(x = log10_size, y = log10_price, col = condition)) +
         geom_point(alpha = 0.3) +
         labs(y = "log10 price", x = "log10 size", title = "House prices in Seattle") +
      @@ -964,14 +884,14 @@ 

      13.1.3 EDA Part II

      Interaction model with facets

      -FIGURE 13.8: Interaction model with facets +Figure 12.8: Interaction model with facets

      -

      Which exploratory visualization of the interaction model is better, the one in Figure 13.7 or Figure 13.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.

      +

      Which exploratory visualization of the interaction model is better, the one in Figure 12.7 or Figure 12.8? There is no universal right answer, you need to make a choice depending on what you want to convey, and own it.

      -

      13.1.4 Regression modeling

      -

      For now let’s focus on the latter, interaction model we’ve visualized in Figure 13.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 13.8? To determine these, we first need the values from the regression table:

      +

      12.1.4 Regression modeling

      +

      For now let’s focus on the latter, interaction model we’ve visualized in Figure 12.8 above. What are the 5 different slopes and intercepts for the condition = 1, condition = 2, …, and condition = 5 lines in Figure 12.8? To determine these, we first need the values from the regression table:

      # Fit regression model:
       price_interaction <- lm(log10_price ~ log10_size * condition, data = house_prices)
       # Get regression table:
      @@ -999,7 +919,7 @@ 

      13.1.4 Regression modeling

    • Condition 4: \(\widehat{\log10(\text{price})} = (3.33 - 0.398) + (0.69 + 0.146) * \log10(\text{size}) = 2.93 + 0.836 * \log10(\text{size})\)
    • Condition 5: \(\widehat{\log10(\text{price})} = (3.33 - 0.883) + (0.69 + 0.31) * \log10(\text{size}) = 2.45 + 1 * \log10(\text{size})\)
    • -

      These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 13.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.

      +

      These correspond to the regression lines in the exploratory visualization of the interaction model in Figure 12.7 above. For homes of all 5 condition types, as the size of the house increases, the prices increases. This is what most would expect. However, the rate of increase of price with size is fastest for the homes with condition 3, 4, and 5 of 0.823, 0.836, and 1 respectively; these are the 3 most largest slopes out of the 5.

      -

      13.1.5 Making predictions

      -

      Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 13.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect:

      +

      12.1.5 Making predictions

      +

      Say you’re a realtor and someone calls you asking you how much their home will sell for. They tell you that it’s in condition = 5 and is sized 1900 square feet. What do you tell them? We first make this prediction visually in Figure 12.9. The predicted log10_price of this house is marked with a black dot: it is where the two following lines intersect:

      • The purple regression line for the condition = 5 homes and
      • The vertical dashed black line at log10_size equals 3.28, since our predictor variable is the log10-transformed square feet of living space and \(\log10(1900) = 3.28\) .
      • @@ -1016,14 +936,14 @@

        13.1.5 Making predictions

        Interaction model with prediction

        -FIGURE 13.9: Interaction model with prediction +Figure 12.9: Interaction model with prediction

        Eyeballing it, it seems the predicted log10_price seems to be around 5.72. Let’s now obtain the an exact numerical value for the prediction using the values of the intercept and slope for the condition = 5 that we computed using the regression table output. We use the equation for the condition = 5 line, being sure to log10() the square footage first.

        2.45 + 1 * log10(1900)
        [1] 5.73
        -

        This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 13.1.2.

        +

        This value is very close to our earlier visually made prediction of 5.72. But wait! We were using the outcome variable log10_price as our outcome variable! So if we want a prediction in terms of price in dollar units, we need to un-log this by taking a power of 10 as described in Section 12.1.2.

        10^(2.45 + 1 * log10(1900))
        [1] 535493

        So we our predicted price for this home of condition 5 and size 1900 square feet is $535,493.

        @@ -1032,7 +952,7 @@

        13.1.5 Making predictions

        Learning check

      -

      (LC12.1) Repeat the regression modeling in Subsection 13.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 13.1.5, but using the parallel slopes model you visualized in Figure 13.6. Hint: it’s $524,807! +

      (LC12.1) Repeat the regression modeling in Subsection 12.1.4 and the prediction making you just did on the house of condition 5 and size 1900 square feet in Subsection 12.1.5, but using the parallel slopes model you visualized in Figure 12.6. Hint: it’s $524,807! + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/3-tidy.html b/docs/previous_versions/v0.4.0/3-tidy.html new file mode 100644 index 000000000..e69de29bb diff --git a/docs/previous_versions/v0.4.0/3-viz.html b/docs/previous_versions/v0.4.0/3-viz.html new file mode 100644 index 000000000..871f30f95 --- /dev/null +++ b/docs/previous_versions/v0.4.0/3-viz.html @@ -0,0 +1,1711 @@ + + + + + + + + 3 Data Visualization via ggplot2 | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      3 Data Visualization via ggplot2

      +

      We begin the development of your data science toolbox with data visualization. By visualizing our data, we will be able to gain valuable insights from our data that we couldn’t initially see from just looking at the raw data in spreadsheet form. We will use the ggplot2 package as it provides an easy way to customize your plots and is rooted in the data visualization theory known as The Grammar of Graphics (Wilkinson 2005).

      +

      At the most basic level, graphics/plots/charts (we use these terms interchangeably in this book) provide a nice way for us to get a sense for how quantitative variables compare in terms of their center (where the values tend to be located) and their spread (how they vary around the center). The most important thing to know about graphics is that they should be created to make it obvious for your audience to understand the findings and insight you want to get across. This does however require a balancing act. On the one hand, you want to highlight as many meaningful relationships and interesting findings as possible, but on the other you don’t want to include so many as to overwhelm your audience.

      +

      As we will see, plots/graphics also help us to identify patterns and outliers in our data. We will see that a common extension of these ideas is to compare the distribution of one quantitative variable (i.e., what the spread of a variable looks like or how the variable is distributed in terms of its values) as we go across the levels of a different categorical variable.

      +
      +

      Needed packages

      +

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages.

      +
      library(nycflights13)
      +library(ggplot2)
      +library(dplyr)
      +
      +
      +

      DataCamp

      +

      Our approach to introducing data visualization via the Grammar of Graphics and the ggplot2 package is very similar to the approach taken in David Robinson’s DataCamp course “Introduction to the Tidyverse,” a course targeted at people new to R and the tidyverse. If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters of the course are Chapter 2 on “Data visualization” and Chapter 4 on “Types of visualizations”.

      +
      +Drawing +
      +
      +
      +

      3.1 The Grammar of Graphics

      +

      We begin with a discussion of a theoretical framework for data visualization known as the “The Grammar of Graphics,” which serves as the basis for the ggplot2 package. Much like how we construct sentences in any language by using a linguistic grammar (nouns, verbs, subjects, objects, etc.), the theoretical framework given by Leland Wilkinson (Wilkinson 2005) allows us to specify the components of a statistical graphic.

      +
      +

      3.1.1 Components of the Grammar

      +

      In short, the grammar tells us that:

      +
      +

      A statistical graphic is a mapping of data variables to aesthetic attributes of geometric objects.

      +
      +

      Specifically, we can break a graphic into the following three essential components:

      +
        +
      1. data: the data-set comprised of variables that we map.
      2. +
      3. geom: the geometric object in question. This refers to our type of objects we can observe in our plot. For example, points, lines, bars, etc.
      4. +
      5. aes: aesthetic attributes of the geometric object that we can perceive on a graphic. For example, x/y position, color, shape, and size. Each assigned aesthetic attribute can be mapped to a variable in our data-set.
      6. +
      +

      Let’s break down the grammar with an example.

      +
      +
      +

      3.1.2 Gapminder

      +

      In February 2006, a statistician named Hans Rosling gave a TED talk titled “The best stats you’ve ever seen” where he presented global economic, health, and development data from the website gapminder.org. For example, from the 1704 countries included from 2007, consider only the first 6 countries when listed alphabetically:

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 3.1: Gapminder 2007 Data: First 6 of 142 countries
      CountryContinentLife ExpectancyPopulationGDP per Capita
      AfghanistanAsia43.8331889923974.58
      AlbaniaEurope76.4236005235937.03
      AlgeriaAfrica72.30333332166223.37
      AngolaAfrica42.73124204764797.23
      ArgentinaAmericas75.324030192712779.38
      AustraliaOceania81.232043417634435.37
      +

      Each row in this table corresponds to a country in 2007. For each row, we have 5 columns:

      +
        +
      1. Country: Name of country.
      2. +
      3. Continent: Which of the five continents the country is part of. (Note that Americas groups North and South America and that Antarctica is excluded here.)
      4. +
      5. Life Expectancy: Life expectancy in years.
      6. +
      7. Population: Number of people living in the country.
      8. +
      9. GDP per Capita: Gross domestic product (in US dollars).
      10. +
      +

      Now consider Figure 3.1, which plots this data for all 142 countries in the data frame. Note that R will deal with large numbers using scientific notation. So in the legend for “Population”, 1.25e+09 = \(1.25 \times 10^{9}\) = 1,250,000,000 = 1.25 billion.

      +
      +Life Expectancy over GDP per Capita in 2007 +

      +Figure 3.1: Life Expectancy over GDP per Capita in 2007 +

      +
      +

      Let’s view this plot through the grammar of graphics:

      +
        +
      1. The data variable GDP per Capita gets mapped to the x-position aesthetic of the points.
      2. +
      3. The data variable Life Expectancy gets mapped to the y-position aesthetic of the points.
      4. +
      5. The data variable Population gets mapped to the size aesthetic of the points.
      6. +
      7. The data variable Continent gets mapped to the color aesthetic of the points.
      8. +
      +

      Recall that data here corresponds to each of the variables being in the same data frame and the “data variable” corresponds to a column in a data frame.

      +

      While in this example we are considering one type of geometric object (of type point), graphics are not limited to just points. Some plots involve lines while others involve bars. Let’s summarize the three essential components of the grammar in a table:

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 3.2: Summary of Grammar of Graphics for this plot
      data variableaesgeom
      GDP per Capitaxpoint
      Life Expectancyypoint
      Populationsizepoint
      Continentcolorpoint
      +
      +
      +

      3.1.3 Other components of the Grammar

      +

      There are other components of the Grammar of Graphics we can control. As you start to delve deeper into the Grammar of Graphics, you’ll start to encounter these topics more and more often. In this book, we’ll only work with the two other components below (The other components are left to a more advanced text such as R for Data Science (Grolemund and Wickham 2016)):

      +
        +
      • faceting breaks up a plot into small multiples corresponding to the levels of another variable (Section 3.6)
      • +
      • position adjustments for barplots (Section 3.8) +
      • +
      +

      In general, the Grammar of Graphics allows for a high degree of customization and also a consistent framework for easy updating/modification of plots.

      +
      +
      +

      3.1.4 The ggplot2 package

      +

      In this book, we will be using the ggplot2 package for data visualization, which is an implementation of the Grammar of Graphics for R (Wickham et al. 2018). You may have noticed that a lot of the previous text in this chapter is written in computer font. This is because the various components of the Grammar of Graphics are specified in the ggplot function, which expects at a bare minimum as arguments:

      +
        +
      • The data frame where the variables exist: the data argument
      • +
      • The mapping of the variables to aesthetic attributes: the mapping argument, which specifies the aesthetic attributes involved
      • +
      +

      After we’ve specified these components, we then add layers to the plot using the + sign. The most essential layer to add to a plot is the specification of which type of geometric object we want the plot to involve; e.g. points, lines, bars. Other layers we can add include the specification of the plot title, axes labels, facets, and visual themes for the plot.

      +

      Let’s now put the theory of the Grammar of Graphics into practice.

      + + +
      +
      +
      +

      3.2 Five Named Graphs - The 5NG

      +

      For our purposes, we will be limiting consideration to five different types of graphs. We term these five named graphs the 5NG:

      +
        +
      1. scatterplots
      2. +
      3. linegraphs
      4. +
      5. boxplots
      6. +
      7. histograms
      8. +
      9. barplots
      10. +
      +

      We will discuss some variations of these plots, but with this basic repertoire in your toolbox you can visualize a wide array of different data variable types. Note that certain plots are only appropriate for categorical/logical variables and others only for quantitative variables. You’ll want to quiz yourself often as we go along on which plot makes sense a given a particular problem or data-set.

      + +
      +
      +

      3.3 5NG#1: Scatterplots

      +

      The simplest of the 5NG are scatterplots (also called bivariate plots); they allow you to investigate the relationship between two numerical variables. While you may already be familiar with this type of plot, let’s view it through the lens of the Grammar of Graphics. Specifically, we will graphically investigate the relationship between the following two numerical variables in the flights data frame:

      +
        +
      1. dep_delay: departure delay on the horizontal “x” axis and
      2. +
      3. arr_delay: arrival delay on the vertical “y” axis
      4. +
      +

      for Alaska Airlines flights leaving NYC in 2013. This requires paring down the flights data frame to a smaller data frame all_alaska_flights consisting of only Alaska Airlines (carrier code “AS”) flights. Don’t worry for now if you don’t fully understand what this code is doing, we’ll explain this in details Chapter 5, just run it all and understand that we are taking all flights and only considering those corresponding to Alaska Airlines.

      +
      all_alaska_flights <- flights %>% 
      +  filter(carrier == "AS")
      +

      This code snippet makes use of functions in the dplyr package for data wrangling to achieve our goal: it takes the flights data frame and filters it to only return the rows which meet the condition carrier == "AS". Recall from Section 2.2 that testing for equality is specified with == and not =. You will see many more examples of == and filter() in Chapter 5.

      +
      +

      +Learning check +

      +
      +

      (LC3.1) Take a look at both the flights and all_alaska_flights data frames by running View(flights) and View(all_alaska_flights) in the console. In what respect do these data frames differ?

      +
      + +
      +
      +

      3.3.1 Scatterplots via geom_point

      +

      We proceed to create the scatterplot using the ggplot() function:

      +
      ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
      +  geom_point()
      +
      +Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 +

      +Figure 3.2: Arrival Delays vs Departure Delays for Alaska Airlines flights from NYC in 2013 +

      +
      +

      In Figure 3.2 we see that a positive relationship exists between dep_delay and arr_delay: as departure delays increase, arrival delays tend to also increase. We also note that the majority of points fall near the point (0, 0). There is a large mass of points clustered there. Furthermore after executing this code, R returns a warning message alerting us to the fact that 5 rows were ignored due to missing values. For 5 rows either the value for dep_delay or arr_delay or both were missing, and thus these rows were ignored in our plot.

      +

      Let’s go back to the ggplot() function call that created this visualization, keeping in mind our discussion in Section 3.1:

      +
        +
      • Within the ggplot() function call, we specify two of the components of the grammar: +
          +
        1. The data frame to be all_alaska_flights by setting data = all_alaska_flights
        2. +
        3. The aesthetic mapping by setting aes(x = dep_delay, y = arr_delay). Specifically +
            +
          • the variable dep_delay maps to the x position aesthetic
          • +
          • the variable arr_delay maps to the y position aesthetic
          • +
        4. +
      • +
      • We add a layer to the ggplot() function call using the + sign. The layer in question specifies the third component of the grammar: the geometric object. In this case the geometric object are points, set by specifying geom_point().
      • +
      +

      Some notes on layers:

      +
        +
      • Note that the + sign comes at the end of lines, and not at the beginning. You’ll get an error in R if you put it at the beginning.
      • +
      • When adding layers to a plot, you are encouraged to hit Return on your keyboard after entering the + so that the code for each layer is on a new line. As we add more and more layers to plots, you’ll see this will greatly improve the legibility of your code.
      • +
      • To stress the importance of adding layers, in particular the layer specifying the geometric object, consider Figure 3.3 where no layers are added. A not very useful plot!
      • +
      +
      ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay))
      +
      +Plot with No Layers +

      +Figure 3.3: Plot with No Layers +

      +
      +
      +

      +Learning check +

      +
      +

      (LC3.2) What are some practical reasons why dep_delay and arr_delay have a positive relationship?

      +

      (LC3.3) What variables (not necessarily in the flights data frame) would you expect to have a negative correlation (i.e. a negative relationship) with dep_delay? Why? Remember that we are focusing on numerical variables here.

      +

      (LC3.4) Why do you believe there is a cluster of points near (0, 0)? What does (0, 0) correspond to in terms of the Alaskan flights?

      +

      (LC3.5) What are some other features of the plot that stand out to you?

      +

      (LC3.6) Create a new scatterplot using different variables in the all_alaska_flights data frame by modifying the example above.

      +
      + +
      +
      +
      +

      3.3.2 Over-plotting

      +

      The large mass of points near (0, 0) in Figure 3.2 can cause some confusion. This is the result of a phenomenon called overplotting. As one may guess, this corresponds to values being plotted on top of each other over and over again. It is often difficult to know just how many values are plotted in this way when looking at a basic scatterplot as we have here. There are two ways to address this issue:

      +
        +
      1. By adjusting the transparency of the points via the alpha argument
      2. +
      3. By jittering the points via geom_jitter()
      4. +
      +

      The first way of relieving overplotting is by changing the alpha argument in geom_point() which controls the transparency of the points. By default, this value is set to 1. We can change this to any value between 0 and 1 where 0 sets the points to be 100% transparent and 1 sets the points to be 100% opaque. Note how the following function call is identical to the one in Section 3.3, but with alpha = 0.2 added to the geom_point().

      +
      ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
      +  geom_point(alpha = 0.2)
      +
      +Delay scatterplot with alpha=0.2 +

      +Figure 3.4: Delay scatterplot with alpha=0.2 +

      +
      +

      The key feature to note in Figure 3.4 is that the transparency of the points is cumulative: areas with a high-degree of overplotting are darker, whereas areas with a lower degree are less dark.

      +

      Note that there is no aes() surrounding alpha = 0.2 here. Since we are NOT mapping a variable to an aesthetic but instead are just changing a setting, we don’t need to create a mapping with aes(). In fact, you’ll receive an error if you try to change the second line above to geom_point(aes(alpha = 0.2)).

      +

      The second way of relieving overplotting is to jitter the points a bit. In other words, we are going to add just a bit of random noise to the points to better see them and alleviate some of the overplotting. You can think of “jittering” as shaking the points around a bit on the plot. Let’s illustrate using a simple example first. Say we have a data frame jitter_example with 4 rows of identical value 0 for both x and y:

      +
      jitter_example
      +
      # A tibble: 4 x 2
      +      x     y
      +  <dbl> <dbl>
      +1     0     0
      +2     0     0
      +3     0     0
      +4     0     0
      +

      We display the resulting scatterplot in Figure 3.5; observe that the 4 points are superimposed on top of each other. While we know there are 4 values being plotted, this fact might not be apparent to others.

      +
      +Regular scatterplot of jitter example data +

      +Figure 3.5: Regular scatterplot of jitter example data +

      +
      +

      In Figure 3.6 we instead display a jittered scatterplot. Since each point is given a random “nudge”, it is now plainly evident that there are four points.

      +
      +Jittered scatterplot of jitter example data +

      +Figure 3.6: Jittered scatterplot of jitter example data +

      +
      +

      To create a jittered scatterplot, instead of using geom_point, we use geom_jitter. To specify how much jitter to add, we adjust the width and height arguments. This corresponds to how hard you’d like to shake the plot in units corresponding to those for both the horizontal and vertical variables (in this case, minutes). It is important to add just enough jitter to break any overlap in points, but not so much that we completely obscure the overall pattern in points.

      +
      ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
      +  geom_jitter(width = 30, height = 30)
      +
      +Jittered delay scatterplot +

      +Figure 3.7: Jittered delay scatterplot +

      +
      +

      Observe how this function call is identical to the one in Subsection 3.3.1, but with geom_point() replaced with geom_jitter(). Also, it is important to note that geom_jitter() is strictly a visualization tool and that does not alter the original values saved in jitter_example.

      +

      The plot in Figure 3.7 helps us a little bit in getting a sense for the overplotting, but with a relatively large data-set like this one (714 flights), it can be argued that changing the transparency of the points by setting alpha proved more effective.

      +

      Furthermore, we’ll see later on that the two following R commands will yield the exact same plot:

      +
      ggplot(data = all_alaska_flights, mapping = aes(x = dep_delay, y = arr_delay)) + 
      +  geom_jitter(width = 30, height = 30)
      +ggplot(all_alaska_flights, aes(x = dep_delay, y = arr_delay)) + 
      +  geom_jitter(width = 30, height = 30)
      +

      In other words you can drop the data = and mapping = if you keep the order of the two arguments the same. Since the ggplot() function is expecting its first argument data to be a data frame and its second argument to correspond to mapping =, you can omit both and you’ll get the same plot. As you get more and more practice, you’ll likely find yourself not including the specification of the argument like this. But for now to keep things straightforward let’s make it a point to include the data = and mapping =.

      +
      +

      +Learning check +

      +
      +

      (LC3.7) Why is setting the alpha argument value useful with scatterplots? What further information does it give you that a regular scatterplot cannot?

      +

      (LC3.8) After viewing the Figure 3.4 above, give an approximate range of arrival delays and departure delays that occur the most frequently. How has that region changed compared to when you observed the same plot without the alpha = 0.2 set in Figure 3.2?

      +
      + +
      + +
      +
      +

      3.3.3 Summary

      +

      Scatterplots display the relationship between two numerical variables. They are among the most commonly used plots because they can provide an immediate way to see the trend in one variable versus another. However, if you try to create a scatterplot where either one of the two variables is not numerical, you will get strange results. Be careful!

      +

      With medium to large data-sets, you may need to play with either geom_jitter() or the alpha argument in order to get a good feel for relationships in your data. This tweaking is often a fun part of data visualization since you’ll have the chance to see different relationships come about as you make subtle changes to your plots.

      +
      +
      +
      +

      3.4 5NG#2: Linegraphs

      +

      The next of the 5NG is a linegraph. They are most frequently used when the x-axis represents time and the y-axis represents some other numerical variable; such plots are known as time series. Time represents a variable that is connected together by each day following the previous day. In other words, time has a natural ordering. Linegraphs should be avoided when there is not a clear sequential ordering to the explanatory variable, i.e. the x-variable or the predictor variable.

      +

      Our focus now turns to the temp variable in this weather data-set. By

      +
        +
      • Looking over the weather data-set by typing View(weather) in the console.
      • +
      • Running ?weather to bring up the help file.
      • +
      +

      We can see that the temp variable corresponds to hourly temperature (in Fahrenheit) recordings at weather stations near airports in New York City. Instead of considering all hours in 2013 for all three airports in NYC, let’s focus on the hourly temperature at Newark airport (origin code “EWR”) for the first 15 days in January 2013. The weather data frame in the nycflights13 package contains this data, but we first need to filter it to only include those rows that correspond to Newark in the first 15 days of January.

      +
      early_january_weather <- weather %>% 
      +  filter(origin == "EWR" & month == 1 & day <= 15)
      +

      This is similar to the previous use of the filter command in Section 3.3, however we now use the & operator. The above selects only those rows in weather where the originating airport is "EWR" and we are in the first month and the day is from 1 to 15 inclusive.

      +
      +

      +Learning check +

      +
      +

      (LC3.9) Take a look at both the weather and early_january_weather data frames by running View(weather) and View(early_january_weather) in the console. In what respect do these data frames differ?

      +

      (LC3.10) View() the flights data frame again. Why does the time_hour variable uniquely identify the hour of the measurement whereas the hour variable does not?

      +
      + +
      +
      +

      3.4.1 Linegraphs via geom_line

      +

      We plot a linegraph of hourly temperature using geom_line():

      +
      ggplot(data = early_january_weather, mapping = aes(x = time_hour, y = temp)) +
      +  geom_line()
      +
      +Hourly Temperature in Newark for January 1-15, 2013 +

      +Figure 3.8: Hourly Temperature in Newark for January 1-15, 2013 +

      +
      +

      Much as with the ggplot() call in Chapter 3.3.1, we describe the components of the Grammar of Graphics:

      +
        +
      • Within the ggplot() function call, we specify two of the components of the grammar: +
          +
        1. The data frame to be early_january_weather by setting data = early_january_weather
        2. +
        3. The aesthetic mapping by setting aes(x = time_hour, y = temp). Specifically +
            +
          • time_hour (i.e. the time variable) maps to the x position
          • +
          • temp maps to the y position
          • +
        4. +
      • +
      • We add a layer to the ggplot() function call using the + sign
      • +
      • The layer in question specifies the third component of the grammar: the geometric object in question. In this case the geometric object is a line, set by specifying geom_line().
      • +
      +
      +

      +Learning check +

      +
      +

      (LC3.11) Why should linegraphs be avoided when there is not a clear ordering of the horizontal axis?

      +

      (LC3.12) Why are linegraphs frequently used when time is the explanatory variable?

      +

      (LC3.13) Plot a time series of a variable other than temp for Newark Airport in the first 15 days of January 2013.

      +
      + +
      +
      +
      +

      3.4.2 Summary

      +

      Linegraphs, just like scatterplots, display the relationship between two numerical variables. However, the variable on the x-axis (i.e. the explanatory variable) should have a natural ordering, like some notion of time. We can mislead our audience if that isn’t the case.

      +
      +
      +
      +

      3.5 5NG#3: Histograms

      +

      Let’s consider the temp variable in the weather data frame once again, but now unlike with the linegraphs in Chapter 3.4, let’s say we don’t care about the relationship of temperature to time, but rather we care about the (statistical) distribution of temperatures. We could just produce points where each of the different values appear on something similar to a number line:

      +
      +Plot of Hourly Temperature Recordings from NYC in 2013 +

      +Figure 3.9: Plot of Hourly Temperature Recordings from NYC in 2013 +

      +
      +

      This gives us a general idea of how the values of temp differ. We see that temperatures vary from around 11 up to 100 degrees Fahrenheit. The area between 40 and 60 degrees appears to have more points plotted than outside that range.

      +
      +

      3.5.1 Histograms via geom_histogram

      +

      What is commonly produced instead of the above plot is a plot known as a histogram. The histogram shows how many elements of a single numerical variable fall in specified bins. In this case, these bins may correspond to between 0-10°F, 10-20°F, etc. We produce a histogram of the hour temperatures at all three NYC airports in 2013:

      +
      ggplot(data = weather, mapping = aes(x = temp)) +
      +  geom_histogram()
      +
      `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
      +
      Warning: Removed 1 rows containing non-finite values (stat_bin).
      +
      +Histogram of Hourly Temperature Recordings from NYC in 2013 +

      +Figure 3.10: Histogram of Hourly Temperature Recordings from NYC in 2013 +

      +
      +

      Note here:

      +
        +
      • There is only one variable being mapped in aes(): the single numerical variable temp. You don’t need to compute the y-aesthetic: it gets computed automatically.
      • +
      • We set the geometric object to be geom_histogram()
      • +
      • We got a warning message of 1 rows containing non-finite values being removed. This is due to one of the values of temperature being missing. R is alerting us that this happened.
        +
      • +
      • Another warning corresponds to an urge to specify the number of bins you’d like to create.
      • +
      +
      +
      +

      3.5.2 Adjusting the bins

      +

      We can adjust characteristics of the bins in one of two ways:

      +
        +
      1. By adjusting the number of bins via the bins argument
      2. +
      3. By adjusting the width of the bins via the binwidth argument
      4. +
      +

      First, we have the power to specify how many bins we would like to put the data into as an argument in the geom_histogram() function. By default, this is chosen to be 30 somewhat arbitrarily; we have received a warning above our plot that this was done.

      +
      ggplot(data = weather, mapping = aes(x = temp)) +
      +  geom_histogram(bins = 60, color = "white")
      +
      +Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins +

      +Figure 3.11: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Bins +

      +
      +

      Note the addition of the color argument. If you’d like to be able to more easily differentiate each of the bins, you can specify the color of the outline as done above. You can also adjust the color of the bars by setting the fill argument. Type colors() in your console to see all 657 available colors.

      +
      ggplot(data = weather, mapping = aes(x = temp)) +
      +  geom_histogram(bins = 60, color = "white", fill = "steelblue")
      +
      +Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins +

      +Figure 3.12: Histogram of Hourly Temperature Recordings from NYC in 2013 - 60 Colored Bins +

      +
      +

      Second, instead of specifying the number of bins, we can also specify the width of the bins by using the binwidth argument in the geom_histogram function.

      +
      ggplot(data = weather, mapping = aes(x = temp)) +
      +  geom_histogram(binwidth = 10, color = "white")
      +
      +Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10 +

      +Figure 3.13: Histogram of Hourly Temperature Recordings from NYC in 2013 - Binwidth = 10 +

      +
      +
      +

      +Learning check +

      +
      +

      (LC3.14) What does changing the number of bins from 30 to 60 tell us about the distribution of temperatures?

      +

      (LC3.15) Would you classify the distribution of temperatures as symmetric or skewed?

      +

      (LC3.16) What would you guess is the “center” value in this distribution? Why did you make that choice?

      +

      (LC3.17) Is this data spread out greatly from the center or is it close? Why?

      +
      + +
      +
      +
      +

      3.5.3 Summary

      +

      Histograms, unlike scatterplots and linegraphs, present information on only a single numerical variable. In particular they are visualizations of the (statistical) distribution of values.

      +
      +
      +
      +

      3.6 Facets

      +

      Before continuing the 5NG, we briefly introduce a new concept called faceting. Faceting is used when we’d like to create small multiples of the same plot over a different categorical variable. By default, all of the small multiples will have the same vertical axis.

      +

      For example, suppose we were interested in looking at how the temperature histograms we saw in Chapter 3.5 varied by month. This is what is meant by “the distribution of a variable over another variable”: temp is one variable and month is the other variable. In order to look at histograms of temp for each month, we add a layer facet_wrap(~ month). You can also specify how many rows you’d like the small multiple plots to be in using nrow or how many columns using ncol inside of facet_wrap.

      +
      ggplot(data = weather, mapping = aes(x = temp)) +
      +  geom_histogram(binwidth = 5, color = "white") +
      +  facet_wrap(~ month, nrow = 4)
      +
      +Faceted histogram +

      +Figure 3.14: Faceted histogram +

      +
      +

      Note the use of the ~ before month in facet_wrap. The tilde (~) is required and you’ll receive the error Error in as.quoted(facets) : object 'month' not found if you don’t include it before month here.

      +

      As we might expect, the temperature tends to increase as summer approaches and then decrease as winter approaches.

      +
      +

      +Learning check +

      +
      +

      (LC3.18) What other things do you notice about the faceted plot above? How does a faceted plot help us see relationships between two variables?

      +

      (LC3.19) What do the numbers 1-12 correspond to in the plot above? What about 25, 50, 75, 100?

      +

      (LC3.20) For which types of data-sets would these types of faceted plots not work well in comparing relationships between variables? Give an example describing the nature of these variables and other important characteristics.

      +

      (LC3.21) Does the temp variable in the weather data-set have a lot of variability? Why do you say that?

      +
      + +
      +
      +
      +

      3.7 5NG#4: Boxplots

      +

      While using faceted histograms can provide a way to compare distributions of a numerical variable split by groups of a categorical variable as in Section 3.6, an alternative plot called a boxplot (also called a side-by-side boxplot) achieves the same task and is frequently preferred. The boxplot uses the information provided in the five-number summary referred to in Appendix A. It gives a way to compare this summary information across the different levels of a categorical variable.

      +
      +

      3.7.1 Boxplots via geom_boxplot

      +

      Let’s create a boxplot to compare the monthly temperatures as we did above with the faceted histograms.

      +
      ggplot(data = weather, mapping = aes(x = month, y = temp)) +
      +  geom_boxplot()
      +
      +Invalid boxplot specification +

      +Figure 3.15: Invalid boxplot specification +

      +
      +
      Warning messages:
      +1: Continuous x aesthetic -- did you forget aes(group=...)? 
      +2: Removed 1 rows containing non-finite values (stat_boxplot). 
      +

      Note the set of warnings that is given here. The second warning corresponds to missing values in the data frame and it is turned off on subsequent plots. Let’s focus on the first warning.

      +

      Observe that this plot does not look like what we were expecting. We were expecting to see the distribution of temperatures for each month (so 12 different boxplots). The first warning is letting us know that we are plotting a numerical, and not categorical variable, on the x-axis. This gives us the overall boxplot without any other groupings. We can get around this by introducing a new function for our x variable:

      +
      ggplot(data = weather, mapping = aes(x = factor(month), y = temp)) +
      +  geom_boxplot()
      +
      +Month by temp boxplot +

      +Figure 3.16: Month by temp boxplot +

      +
      +

      We have introduced a new function called factor() which converts a numerical variable to a categorical one. This is necessary as geom_boxplot requires the x variable to be a categorical variable, which the variable month is not. So after applying factor(month), month goes from having numerical values 1, 2, …, 12 to having labels “1”, “2”, …, “12”. The resulting Figure 3.16 shows 12 separate “box and whiskers” plots with the following features:

      +
        +
      • The “box” portions of this plot represent the 25th percentile AKA the 1st quartile, the median AKA the 50th percentile AKA the 2nd quartile, and the 75th percentile AKA the 3rd quartile.
      • +
      • The height of each box, i.e. the value of the 3rd quartile minus the value of the 1st quartile, is called the interquartile range (\(IQR\)). It is a measure of spread of the middle 50% of values, with longer boxes indicating more variability.
      • +
      • The “whisker” portions of these plots extend out from the bottoms and tops of the boxes and represent points less than the 25th percentile and greater than the 75th percentiles respectively. They’re set to extend out no more than \(1.5 \times IQR\) units away from either end of the boxes. We say “no more than” because the ends of the whiskers represent the first observed values of temp to be within the range of the whiskers. The length of these whiskers show how the data outside the middle 50% of values vary, with longer whiskers indicating more variability.
      • +
      • The dots representing values falling outside the whiskers are called outliers. It is important to keep in mind that the definition of an outlier is somewhat arbitrary and not absolute. In this case, they are defined by the length of the whiskers, which are no more than \(1.5 \times IQR\) units long.
      • +
      +

      Looking at this plot we can see, as expected, that summer months (6 through 8) have higher median temperatures as evidenced by the higher solid lines in the middle of the boxes. We can easily compare temperatures across months by drawing imaginary horizontal lines across the plot. Furthermore, the height of the 12 boxes as quantified by the interquartile ranges are informative too; they tell us about variability, or spread, of temperatures recorded in a given month.

      +

      But to really bring home what boxplots show, let’s focus only on the month of November’s 2141 temperature recordings.

      +
      +November boxplot +

      +Figure 3.17: November boxplot +

      +
      +

      Now let’s plot all 2141 temperature recordings for November on top of the boxplot in Figure 3.18.

      +
      +November boxplot with points +

      +Figure 3.18: November boxplot with points +

      +
      +

      What the boxplot does is summarize the 2141 points for you, in particular:

      +
        +
      1. 25% of points (about 534 observations) fall below the bottom edge of the box which is the first quartile of 35.96 degrees Fahrenheit (2.2 degrees Celsius). In other words 25% of observations were colder than 35.96 degrees Fahrenheit.
      2. +
      3. 25% of points fall between the bottom edge of the box and the solid middle line which is the median of 44.96 degrees Fahrenheit (7.8 degrees Celsius). In other words 25% of observations were between 35.96 and 44.96 degrees Fahrenheit.
      4. +
      5. 25% of points fall between the solid middle line and the top edge of the box which is the third quartile of 51.98 degrees Fahrenheit (11.1 degrees Celsius). In other words 25% of observations were between 44.96 and 51.98 degrees Fahrenheit.
      6. +
      7. 25% of points fall over the top edge of the box. In other words 25% of observations were warmer than 51.98 degrees Fahrenheit.
      8. +
      9. The middle 50% of points lie within the interquartile range 16.02 degrees Fahrenheit.
      10. +
      +
      +

      +Learning check +

      +
      +

      (LC3.22) What does the dot at the bottom of the plot for May correspond to? Explain what might have occurred in May to produce this point.

      +

      (LC3.23) Which months have the highest variability in temperature? What reasons do you think this is?

      +

      (LC3.24) We looked at the distribution of a numerical variable over a categorical variable here with this boxplot. Why can’t we look at the distribution of one numerical variable over the distribution of another numerical variable? Say, temperature across pressure, for example?

      +

      (LC3.25) Boxplots provide a simple way to identify outliers. Why may outliers be easier to identify when looking at a boxplot instead of a faceted histogram?

      +
      + +
      +
      +
      +

      3.7.2 Summary

      +

      Boxplots provide a way to compare and contrast the distribution of one quantitative variable across multiple levels of one categorical variable. One can see where the median falls across the different groups by looking at the center line in the box. To see how spread out the variable is across the different groups, look at both the width of the box and also how far the lines stretch vertically from the box. (If the lines stretch far from the box but the box has a small width, the variability of the values closer to the center is much smaller than the variability of the outer ends of the variable.) Outliers are even more easily identified when looking at a boxplot than when looking at a histogram.

      +
      +
      +
      +

      3.8 5NG#5: Barplots

      +

      Both histograms and boxplots represent ways to visualize the variability of numerical variables. Another common task is to present the distribution of a categorical variable. This is a simpler task, focused on how many elements from the data fall into different categories of the categorical variable. Often the best way to visualize these different counts (also known as frequencies) is via a barplot, also known as a barchart.

      +

      One complication, however, is how your data is represented: is the categorical variable of interest “pre-counted” or not? For example, run the following code in your Console. This code manually creates two data frames representing a collection of fruit: 3 apples and 2 oranges.

      +
      fruits <- data_frame(
      +  fruit = c("apple", "apple", "apple", "orange", "orange")
      +)
      +fruits_counted <- data_frame(
      +  fruit = c("apple", "orange"),
      +  number = c(3, 2)
      +)
      +

      We see both the fruits and fruits_counted data frames represent the same collection of fruit. Whereas fruits just lists the fruit individually:

      + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 3.3: Fruits
      fruit
      apple
      apple
      apple
      orange
      orange
      +

      fruits_counted has a variable count which represents pre-counted values of each fruit.

      + + + + + + + + + + + + + + + + + + +
      Table 3.4: Fruits (Pre-Counted)
      fruitnumber
      apple3
      orange2
      +
      +

      3.8.1 Barplots via geom_bar/geom_col

      +

      Let’s generate barplots using these two different representations of the same basket of fruit: 3 apples and 2 oranges. Using the not pre-counted data fruits from Table 3.3:

      +
      ggplot(data = fruits, mapping = aes(x = fruit)) +
      +  geom_bar()
      +
      +Barplot when counts are not pre-counted +

      +Figure 3.19: Barplot when counts are not pre-counted +

      +
      +

      and using the pre-counted data fruits_counted from Table 3.4:

      +
      ggplot(data = fruits_counted, mapping = aes(x = fruit, y = number)) +
      +  geom_col()
      +
      +Barplot when counts are pre-counted +

      +Figure 3.20: Barplot when counts are pre-counted +

      +
      +

      Compare the barplots in Figures 3.19 and 3.20, which are identical, but are based on the two different data frames. Observe that:

      +
        +
      • The code that generates Figure 3.19 based on fruits does not map a variable to the y aesthetic and uses geom_bar().
      • +
      • The code that generates Figure 3.20 based on fruits_counted maps the number variable to the y aesthetic and uses geom_col()
      • +
      +

      Stating the above differently:

      +
        +
      • When the categorical variable you want to plot is not pre-counted in your data frame you need to use geom_bar().
      • +
      • When the categorical variable is pre-counted (in the above fruits_counted example in the variable number), you need to use geom_col() with the y aesthetic explicitly mapped.
      • +
      +

      Please note that understanding this difference is one of ggplot2’s trickier aspects that causes the most confusion, and fortunately this is as complicated as our use of ggplot2 is going to get. Let’s consider a different distribution: the distribution of airlines that flew out of New York City in 2013. Here we explore the number of flights from each airline/carrier. This can be plotted by invoking the geom_bar function in ggplot2:

      +
      ggplot(data = flights, mapping = aes(x = carrier)) +
      +  geom_bar()
      +
      +Number of flights departing NYC in 2013 by airline using geom_bar +

      +Figure 3.21: Number of flights departing NYC in 2013 by airline using geom_bar +

      +
      +

      To get an understanding of what the names of these airlines are corresponding to these carrier codes, we can look at the airlines data frame in the nycflights13 package.

      +
      airlines
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      carriername
      9EEndeavor Air Inc.
      AAAmerican Airlines Inc.
      ASAlaska Airlines Inc.
      B6JetBlue Airways
      DLDelta Air Lines Inc.
      EVExpressJet Airlines Inc.
      F9Frontier Airlines Inc.
      FLAirTran Airways Corporation
      HAHawaiian Airlines Inc.
      MQEnvoy Air
      OOSkyWest Airlines Inc.
      UAUnited Air Lines Inc.
      USUS Airways Inc.
      VXVirgin America
      WNSouthwest Airlines Co.
      YVMesa Airlines Inc.
      +

      Going back to our barplot, we see that United Air Lines, JetBlue Airways, and ExpressJet Airlines had the most flights depart New York City in 2013. To get the actual number of flights by each airline we can use the group_by(), summarize(), and n() functions in the dplyr package on the carrier variable in flights, which we will introduce formally in Chapter 5.

      +
      flights_table <- flights %>% 
      +  group_by(carrier) %>% 
      +  summarize(number = n())
      +flights_table
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      carriernumber
      9E18460
      AA32729
      AS714
      B654635
      DL48110
      EV54173
      F9685
      FL3260
      HA342
      MQ26397
      OO32
      UA58665
      US20536
      VX5162
      WN12275
      YV601
      +

      In this table, the counts of the carriers are pre-counted. To create a barplot using the data frame flights_table, we

      +
        +
      • use geom_col() instead of geom_bar()
      • +
      • map the y aesthetic to the variable number.
      • +
      +

      Compare this barplot using geom_col in Figure 3.22 with the earlier barplot using geom_bar in Figure 3.21. They are identical. However the input data we used for these are different.

      +
      ggplot(data = flights_table, mapping = aes(x = carrier, y = number)) +
      +  geom_col()
      +
      +Number of flights departing NYC in 2013 by airline using geom_col +

      +Figure 3.22: Number of flights departing NYC in 2013 by airline using geom_col +

      +
      + +
      +

      +Learning check +

      +
      +

      (LC3.26) Why are histograms inappropriate for visualizing categorical variables?

      +

      (LC3.27) What is the difference between histograms and barplots?

      +

      (LC3.28) How many Envoy Air flights departed NYC in 2013?

      +

      (LC3.29) What was the seventh highest airline in terms of departed flights from NYC in 2013? How could we better present the table to get this answer quickly?

      +
      + +
      +
      +
      +

      3.8.2 Must avoid pie charts!

      +

      Unfortunately, one of the most common plots seen today for categorical data is the pie chart. While they may see harmless enough, they actually present a problem in that humans are unable to judge angles well. As Naomi Robbins describes in her book “Creating More Effective Graphs” (Robbins 2013), we overestimate angles greater than 90 degrees and we underestimate angles less than 90 degrees. In other words, it is difficult for us to determine relative size of one piece of the pie compared to another.

      +

      Let’s examine our previous barplot example on the number of flights departing NYC by airline. This time we will use a pie chart. As you review this chart, try to identify

      +
        +
      • how much larger the portion of the pie is for ExpressJet Airlines (EV) compared to US Airways (US),
      • +
      • what the third largest carrier is in terms of departing flights, and
      • +
      • how many carriers have fewer flights than United Airlines (UA)?
      • +
      +
      +The dreaded pie chart +

      +Figure 3.23: The dreaded pie chart +

      +
      +

      While it is quite easy to look back at the barplot to get the answer to these questions, it’s quite difficult to get the answers correct when looking at the pie graph. Barplots can always present the information in a way that is easier for the eye to determine relative position. There may be one exception from Nathan Yau at FlowingData.com but we will leave this for the reader to decide:

      +
      +The only good pie chart +

      +Figure 3.24: The only good pie chart +

      +
      +
      +

      +Learning check +

      +
      +

      (LC3.30) Why should pie charts be avoided and replaced by barplots?

      +

      (LC3.31) What is your opinion as to why pie charts continue to be used?

      +
      + +
      +
      +
      +

      3.8.3 Using barplots to compare two categorical variables

      +

      Barplots are the go-to way to visualize the frequency of different categories of a categorical variable. They make it easy to order the counts and to compare the frequencies of one group to another. Another use of barplots (unfortunately, sometimes inappropriately and confusingly) is to compare two categorical variables together. Let’s examine the distribution of outgoing flights from NYC by carrier and airport.

      +

      We begin by getting the names of the airports in NYC that were included in the flights data-set. Here, we preview the inner_join() function from Chapter 5. This function will join the data frame flights with the data frame airports by matching rows that have the same airport code. However, in flights the airport code is included in the origin variable whereas in airports the airport code is included in the faa variable. We will revisit such examples in Section 5.8 on joining data-sets.

      +
      flights_namedports <- flights %>% 
      +  inner_join(airports, by = c("origin" = "faa"))
      +

      After running View(flights_namedports), we see that name now corresponds to the name of the airport as referenced by the origin variable. We will now plot carrier as the horizontal variable. When we specify geom_bar, it will specify count as being the vertical variable. A new addition here is fill = name. Look over what was produced from the plot to get an idea of what this argument gives.

      +
      ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
      +  geom_bar()
      +
      +Stacked barplot comparing the number of flights by carrier and airport +

      +Figure 3.25: Stacked barplot comparing the number of flights by carrier and airport +

      +
      +

      This plot is what is known as a stacked barplot. While simple to make, it often leads to many problems. For example in this plot, it is difficult to compare the heights of the different colors (corresponding to the number of flights from each airport) between the bars (corresponding to the different carriers).

      +

      Note that fill is an aesthetic just like x is an aesthetic, and thus must be included within the parentheses of the aes() mapping. The following code, where the fill aesthetic is specified on the outside will yield an error. This is a fairly common error that new ggplot users make:

      +
      ggplot(data = flights_namedports, mapping = aes(x = carrier), fill = name) +
      +  geom_bar()
      +
      +

      +Learning check +

      +
      +

      (LC3.32) What kinds of questions are not easily answered by looking at the above figure?

      +

      (LC3.33) What can you say, if anything, about the relationship between airline and airport in NYC in 2013 in regards to the number of departing flights?

      +
      + +
      +

      Another variation on the stacked barplot is the side-by-side barplot also called a dodged barplot.

      +
      ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
      +  geom_bar(position = "dodge")
      +
      +Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport +

      +Figure 3.26: Side-by-side AKA dodged barplot comparing the number of flights by carrier and airport +

      +
      +
      +

      +Learning check +

      +
      +

      (LC3.34) Why might the side-by-side (AKA dodged) barplot be preferable to a stacked barplot in this case?

      +

      (LC3.35) What are the disadvantages of using a side-by-side (AKA dodged) barplot, in general?

      +
      + +
      +

      Lastly, an often preferred type of barplot is the faceted barplot. We already saw this concept of faceting and small multiples in Section 3.6. This gives us a nicer way to compare the distributions across both carrier and airport/name.

      +
      ggplot(data = flights_namedports, mapping = aes(x = carrier, fill = name)) +
      +  geom_bar() +
      +  facet_wrap(~ name, ncol = 1)
      +
      +Faceted barplot comparing the number of flights by carrier and airport +

      +Figure 3.27: Faceted barplot comparing the number of flights by carrier and airport +

      +
      +
      +

      +Learning check +

      +
      +

      (LC3.36) Why is the faceted barplot preferred to the side-by-side and stacked barplots in this case?

      +

      (LC3.37) What information about the different carriers at different airports is more easily seen in the faceted barplot?

      +
      + +
      +
      +
      +

      3.8.4 Summary

      +

      Barplots are the preferred way of displaying categorical variables. They are easy-to-understand and make it easy to compare across groups of a categorical variable. When dealing with more than one categorical variable, faceted barplots are frequently preferred over side-by-side or stacked barplots. Stacked barplots are sometimes nice to look at, but it is quite difficult to compare across the levels since the sizes of the bars are all of different sizes. Side-by-side barplots can provide an improvement on this, but the issue about comparing across groups still must be dealt with.

      +
      +
      +
      +

      3.9 Conclusion

      +
      +

      3.9.1 Putting it all together

      +

      Let’s recap all five of the Five Named Graphs (5NG) in Table 3.5 summarizing their differences. Using these 5NG, you’ll be able to visualize the distributions and relationships of variables contained in a wide array of datasets. This will be even more the case as we start to map more variables to more of each geometric object’s aesthetic attribute options, further unlocking the awesome power of the ggplot2 package.

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 3.5: Summary of 5NG
      Named graphShowsGeometric objectNotes
      1ScatterplotRelationship between 2 numerical variablesgeom_point()
      2LinegraphRelationship between 2 numerical variablesgeom_line()Used when there is a sequential order to x-variable e.g. time
      3HistogramDistribution of 1 numerical variablegeom_histogram()Facetted histogram shows distribution of 1 numerical variable split by 1 categorical variable
      4BoxplotDistribution of 1 numerical variable split by 1 categorical variablegeom_boxplot()
      5BarplotDistribution of 1 categorical variablegeom_barplot() when counts are not pre-countedStacked & dodged barplots show distribution of 2 categorical variables
      geom_col() when counts are pre-counted
      +
      +
      +

      3.9.2 Review questions

      +

      Review questions have been designed using the fivethirtyeight R package (Kim, Ismay, and Chunn 2019) with links to the corresponding FiveThirtyEight.com articles in our free DataCamp course Effective Data Storytelling using the tidyverse. The material in this chapter is covered in the chapters of the DataCamp course available below:

      + +
      +
      +

      3.9.3 What’s to come?

      +

      In Chapter 4, we’ll introduce the concept of “tidy data” and how it is used as a key data format for all the packages we use in this textbook. You’ll see that the concept appears to be simple, but actually can be a little challenging to decipher without careful practice. We’ll also investigate how to import CSV (comma-separated value) files into R using the readr package.

      +
      +
      +

      3.9.4 Resources

      +

      An excellent resource as you begin to create plots using the ggplot2 package is a cheatsheet that RStudio has put together entitled “Data Visualization with ggplot2” available

      +
        +
      • by clicking here or
      • +
      • by clicking the RStudio Menu Bar -> Help -> Cheatsheets -> “Data Visualization with ggplot2
      • +
      +

      This cheatsheet covers more than what we’ve discussed in this chapter but provides nice visual descriptions of what each function produces.

      + + +
      +
      +

      3.9.5 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/4-tidy.html b/docs/previous_versions/v0.4.0/4-tidy.html similarity index 85% rename from docs/4-tidy.html rename to docs/previous_versions/v0.4.0/4-tidy.html index e0385321f..83833fde5 100644 --- a/docs/4-tidy.html +++ b/docs/previous_versions/v0.4.0/4-tidy.html @@ -5,11 +5,11 @@ - Statistical Inference via Data Science + 4 Tidy Data via tidyr | An Introduction to Statistical and Data Sciences via R - + - + @@ -17,7 +17,7 @@ - + @@ -25,7 +25,7 @@ - + @@ -36,6 +36,7 @@ + @@ -47,8 +48,7 @@ - - + @@ -71,7 +71,7 @@ + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      6 Basic Regression

      +

      Now that we are equipped with data visualization skills from Chapter 3, an understanding of the “tidy” data format from Chapter 4, and data wrangling skills from Chapter 5, we now proceed with data modeling. The fundamental premise of data modeling is to make explicit the relationship between:

      +
        +
      • an outcome variable \(y\), also called a dependent variable and
      • +
      • an explanatory/predictor variable \(x\), also called an independent variable or covariate.
      • +
      +

      Another way to state this is using mathematical terminology: we will model the outcome variable \(y\) as a function of the explanatory/predictor variable \(x\). Why do we have two different labels, explanatory and predictor, for the variable \(x\)? That’s because roughly speaking data modeling can be used for two purposes:

      +
        +
      1. Modeling for prediction: You want to predict an outcome variable \(y\) based on the information contained in a set of predictor variables. You don’t care so much about understanding how all the variables relate and interact, but so long as you can make good predictions about \(y\), you’re fine. For example, if we know many individuals’ risk factors for lung cancer, such as smoking habits and age, can we predict whether or not they will develop lung cancer? Here we wouldn’t care so much about distinguishing the degree to which the different risk factors contribute to lung cancer, but instead only on whether or not they could be put together to make reliable predictions.
      2. +
      3. Modeling for explanation: You want to explicitly describe the relationship between an outcome variable \(y\) and a set of explanatory variables, determine the significance of any found relationships, and have measures summarizing these. Continuing our example from above, we would now be interested in describing the individual effects of the different risk factors and quantifying the magnitude of these effects. One reason could be to design an intervention to reduce lung cancer cases in a population, such as targeting smokers of a specific age group with an advertisement for smoking cessation programs. In this book, we’ll focus more on this latter purpose.
      4. +
      +

      Data modeling is used in a wide variety of fields, including statistical inference, causal inference, artificial intelligence, and machine learning. There are many techniques for data modeling, such as tree-based models, neural networks and deep learning, and supervised learning. In this chapter, we’ll focus on one particular technique: linear regression, one of the most commonly-used and easy-to-understand approaches to modeling. Recall our discussion in Subsection 2.4.3 on numerical and categorical variables. Linear regression involves:

      +
        +
      • an outcome variable \(y\) that is numerical and
      • +
      • explanatory variables \(\vec{x}\) that are either numerical or categorical.
      • +
      +

      With linear regression there is always only one numerical outcome variable \(y\) but we have choices on both the number and the type of explanatory variables \(\vec{x}\) to use. We’re going to cover the following regression scenarios:

      +
        +
      • In this current chapter on basic regression, we’ll always have only one explanatory variable. +
          +
        • In Section 6.1, this explanatory variable will be a single numerical explanatory variable \(x\). This scenario is known as simple linear regression.
        • +
        • In Section 6.2, this explanatory variable will be a categorical explanatory variable \(x\).
        • +
      • +
      • In the next chapter, Chapter 7 on multiple regression, we’ll have more than one explanatory variable: +
          +
        • We’ll focus on two numerical explanatory variables \(x_1\) and \(x_2\) in Section 7.1. This can be denoted as \(\vec{x}\) as well since we have more than one explanatory variable.
        • +
        • We’ll use one numerical and one categorical explanatory variable in Section 7.1. We’ll also introduce interaction models here; there, the effect of one explanatory variable depends on the value of another.
        • +
      • +
      +

      We’ll study all four of these regression scenarios using real data, all easily accessible via R packages!

      +
      +

      Needed packages

      +

      In this chapter we introduce a new package, moderndive, that is an accompaniment package to this ModernDive book. It includes useful functions for linear regression and other functions as well as data used later in the book. Let’s now load all the packages needed for this chapter. If needed, read Section 2.3 for information on how to install and load R packages.

      +
      library(ggplot2)
      +library(dplyr)
      +library(moderndive)
      +library(gapminder)
      +library(skimr)
      +
      +
      +

      DataCamp

      +

      The introductory basic regression analysis below was the inspiration for a large part of ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 2 “Modeling with Basic Regression”.

      +
      +Drawing +
      +
      +
      +

      6.1 One numerical explanatory variable

      +

      Why do some professors and instructors at universities and colleges get high teaching evaluations from students while others don’t? What factors can explain these differences? Are there biases? These are questions that are of interest to university/college administrators, as teaching evaluations are among the many criteria considered in determining which professors and instructors should get promotions. Researchers at the University of Texas in Austin, Texas (UT Austin) tried to answer this question: what factors can explain differences in instructor’s teaching evaluation scores? To this end, they collected information on \(n = 463\) instructors. A full description of the study can be found at openintro.org.

      +

      We’ll keep things simple for now and try to explain differences in instructor evaluation scores as a function of one numerical variable: their “beauty score.” The specifics on how this score was calculated will be described shortly.

      +

      Could it be that instructors with higher beauty scores also have higher teaching evaluations? Could it be instead that instructors with higher beauty scores tend to have lower teaching evaluations? Or could it be there is no relationship between beauty score and teaching evaluations?

      +

      We’ll achieve ways to address these questions by modeling the relationship between these two variables with a particular kind of linear regression called simple linear regression. Simple linear regression is the most basic form of linear regression. With it we have

      +
        +
      1. A numerical outcome variable \(y\). In this case, their teaching score.
      2. +
      3. A single numerical explanatory variable \(x\). In this case, their beauty score.
      4. +
      +
      +

      6.1.1 Exploratory data analysis

      +

      A crucial step before doing any kind of modeling or analysis is performing an exploratory data analysis, or EDA, of all our data. Exploratory data analysis can give you a sense of the distribution of the data, and whether there are outliers and/or missing values. Most importantly, it can inform how to build your model. There are many approaches to exploratory data analysis; here are three:

      +
        +
      1. Most fundamentally: just looking at the raw values, in a spreadsheet for example. While this may seem trivial, many people ignore this crucial step!
      2. +
      3. Computing summary statistics likes means, medians, and standard deviations.
      4. +
      5. Creating data visualizations.
      6. +
      +

      Let’s load the data, select only a subset of the variables, and look at the raw values. Recall you can look at the raw values by running View() in the console in RStudio to pop-up the spreadsheet viewer with the data frame of interest as the argument to View(). Here, however, we present only a snapshot of five randomly chosen rows:

      +
      evals_ch6 <- evals %>%
      +  select(score, bty_avg, age)
      +
      evals_ch6 %>% 
      +  sample_n(5)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.1: Random sample of 5 instructors
      scorebty_avgage
      3.66.6734
      4.93.5043
      3.32.3347
      4.44.6733
      4.73.6760
      +

      While a full description of each of these variables can be found at openintro.org, let’s summarize what each of these variables represents.

      +
        +
      1. score: Numerical variable of the average teaching score based on students’ evaluations between 1 and 5. This is the outcome variable \(y\) of interest.
      2. +
      3. bty_avg: Numerical variable of average “beauty” rating based on a panel of 6 students’ scores between 1 and 10. This is the numerical explanatory variable \(x\) of interest. Here 1 corresponds to a low beauty rating and 10 to a high beauty rating.
      4. +
      5. age: A numerical variable of age in years as an integer value.
      6. +
      +

      Another way to look at the raw values is using the glimpse() function, which gives us a slightly different view of the data. We see Observations: 463, indicating that there are 463 observations in evals, each corresponding to a particular instructor at UT Austin. Expressed differently, each row in the data frame evals corresponds to one of 463 instructors.

      +
      glimpse(evals_ch6)
      +
      Observations: 463
      +Variables: 3
      +$ score   <dbl> 4.7, 4.1, 3.9, 4.8, 4.6, 4.3, 2.8, 4.1, 3.4, 4.5, 3.8, 4.5, 4…
      +$ bty_avg <dbl> 5.00, 5.00, 5.00, 5.00, 3.00, 3.00, 3.00, 3.33, 3.33, 3.17, 3…
      +$ age     <int> 36, 36, 36, 36, 59, 59, 59, 51, 51, 40, 40, 40, 40, 40, 40, 4…
      +

      Since both the outcome variable score and the explanatory variable bty_avg are numerical, we can compute summary statistics about them such as the mean, median, and standard deviation. Let’s take evals_ch6 and select only the two variables of interest for now. However, let’s instead pipe this into the skim() function from the skimr package. This function quickly uses a “skim” of the data to return the following summary information about each variable.

      +
      evals_ch6 %>% 
      +  select(score, bty_avg) %>% 
      +  skim()
      +
      Skim summary statistics
      + n obs: 463 
      + n variables: 2 
      +
      +── Variable type:numeric ─────
      + variable missing complete   n mean   sd   p0  p25  p50 p75 p100     hist
      +  bty_avg       0      463 463 4.42 1.53 1.67 3.17 4.33 5.5 8.17 ▂▅▅▇▃▃▂▁
      +    score       0      463 463 4.17 0.54 2.3  3.8  4.3  4.6 5    ▁▁▂▃▅▇▇▆
      +

      In this case for our two numerical variables bty_avg beauty score and teaching score score it returns:

      +
        +
      • missing: the number of missing values
      • +
      • complete: the number of non-missing or complete values
      • +
      • n: the total number of values
      • +
      • mean: the average
      • +
      • sd: the standard deviation
      • +
      • p0: the 0th percentile: the value at which 0% of observations are smaller than it. This is also known as the minimum
      • +
      • p25: the 25th percentile: the value at which 25% of observations are smaller than it. This is also known as the 1st quartile
      • +
      • p50: the 25th percentile: the value at which 50% of observations are smaller than it. This is also know as the 2nd quartile and more commonly the median
      • +
      • p75: the 75th percentile: the value at which 75% of observations are smaller than it. This is also known as the 3rd quartile
      • +
      • p100: the 100th percentile: the value at which 100% of observations are smaller than it. This is also known as the maximum
      • +
      • A quick snapshot of the histogram
      • +
      +

      We get an idea of how the values in both variables are distributed. For example, the mean teaching score was 4.17 out of 5 whereas the mean beauty score was 4.42 out of 10. Furthermore, the middle 50% of teaching scores were between 3.80 and 4.6 (the first and third quartiles) while the middle 50% of beauty scores were between 3.17 and 5.5 out of 10.

      +

      The skim() function however only returns what are called univariate summaries, i.e. summaries about single variables at a time. Since we are considering the relationship between two numerical variables, it would be nice to have a summary statistic that simultaneously considers both variables. The correlation coefficient is a bivariate summary statistic that fits this bill. Coefficients in general are quantitative expressions of a specific property of a phenomenon. A correlation coefficient is a quantitative expression between -1 and 1 that summarizes the strength of the linear relationship between two numerical variables:

      +
        +
      • -1 indicates a perfect negative relationship: as the value of one variable goes up, the value of the other variable tends to go down.
      • +
      • 0 indicates no relationship: the values of both variables go up/down independently of each other.
      • +
      • +1 indicates a perfect positive relationship: as the value of one variable goes up, the value of the other variable tends to go up as well.
      • +
      +

      Figure 6.1 gives examples of different correlation coefficient values for hypothetical numerical variables \(x\) and \(y\). We see that while for a correlation coefficient of -0.75 there is still a negative relationship between \(x\) and \(y\), it is not as strong as the negative relationship between \(x\) and \(y\) when the correlation coefficient is -1.

      +
      +Different correlation coefficients +

      +Figure 6.1: Different correlation coefficients +

      +
      +

      The correlation coefficient is computed using the get_correlation() function in the moderndive package, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. We place the name of the response variable on the left hand side of the ~ and the explanatory variable on the right hand side of the “tilde.” We will use this same “formula” syntax with regression later in this chapter.

      +
      evals_ch6 %>% 
      +  get_correlation(formula = score ~ bty_avg)
      +
      # A tibble: 1 x 1
      +  correlation
      +        <dbl>
      +1       0.187
      +

      The correlation coefficient can also be computed using the cor() function, where in this case the inputs to the function are the two numerical variables from which we want to calculate the correlation coefficient. Recall from Subsection 2.4.3 that the $ pulls out specific variables from a data frame:

      +
      cor(x = evals_ch6$bty_avg, y = evals_ch6$score)
      +
      [1] 0.187
      +

      In our case, the correlation coefficient of 0.187 indicates that the relationship between teaching evaluation score and beauty average is “weakly positive.” There is a certain amount of subjectivity in interpreting correlation coefficients, especially those that aren’t close to -1, 0, and 1. For help developing such intuition and more discussion on the correlation coefficient see Subsection 6.3.1 below.

      +

      Let’s now proceed by visualizing this data. Since both the score and bty_avg variables are numerical, a scatterplot is an appropriate graph to visualize this data. Let’s do this using geom_point() and set informative axes labels and title and display the result in Figure 6.2.

      +
      ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
      +  geom_point() +
      +  labs(x = "Beauty Score", y = "Teaching Score", 
      +       title = "Relationship of teaching and beauty scores")
      +
      +Instructor evaluation scores at UT Austin +

      +Figure 6.2: Instructor evaluation scores at UT Austin +

      +
      +

      Observe the following:

      +
        +
      1. Most “beauty” scores lie between 2 and 8.
      2. +
      3. Most teaching scores lie between 3 and 5.
      4. +
      5. Recall our earlier computation of the correlation coefficient, which describes the strength of the linear relationship between two numerical variables. Looking at Figure 6.3, it is not immediately apparent that these two variables are positively related. This is to be expected given the positive, but rather weak (close to 0), correlation coefficient of 0.187.
      6. +
      +

      Before we continue, we bring to light an important fact about this dataset: it suffers from overplotting. Recall from the data visualization Subsection 3.3.2 that overplotting occurs when several points are stacked directly on top of each other thereby obscuring the number of points. For example, let’s focus on the 6 points in the top-right of the plot with a beauty score of around 8 out of 10: are there truly only 6 points, or are there many more just stacked on top of each other? You can think of these as ties. Let’s break up these ties with a little random “jitter” added to the points in Figure 6.3.

      +
      +Instructor evaluation scores at UT Austin: Jittered +

      +Figure 6.3: Instructor evaluation scores at UT Austin: Jittered +

      +
      +

      Jittering adds a little random bump to each of the points to break up these ties: just enough so you can distinguish them, but not so much that the plot is overly altered. Furthermore, jittering is strictly a visualization tool; it does not alter the original values in the dataset.

      +

      Let’s compare side-by-side the regular scatterplot in Figure 6.2 with the jittered scatterplot in Figure 6.3 in Figure 6.4.

      +
      +Comparing regular and jittered scatterplots. +

      +Figure 6.4: Comparing regular and jittered scatterplots. +

      +
      +

      We make several further observations:

      + +
        +
      1. Focusing our attention on the top-right of the plot again, as noted earlier where there seemed to only be 6 points in the regular scatterplot, we see there were in fact really 9 as seen in the jittered scatterplot.
      2. +
      3. A further interesting trend is that the jittering revealed a large number of instructors with beauty scores of between 3 and 4.5, towards the lower end of the beauty scale.
      4. +
      +

      Going forward for simplicity’s sake however, we’ll only present regular scatterplot rather than the jittered scatterplots; we’ll only keep the overplotting in mind whenever looking at such plots. Going back to scatterplot in Figure 6.2, let’s improve on it by adding a “regression line” in Figure 6.5. This is easily done by adding a new layer to the ggplot code that created Figure 6.3: + geom_smooth(method = "lm"). A regression line is a “best fitting” line in that of all possible lines you could draw on this plot, it is “best” in terms of some mathematical criteria. We discuss the criteria for “best” in Subsection 6.3.3 below, but we suggest you read this only after covering the concept of a residual coming up in Subsection 6.1.3.

      +
      ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
      +  geom_point() +
      +  labs(x = "Beauty Score", y = "Teaching Score", 
      +       title = "Relationship of teaching and beauty scores") +  
      +  geom_smooth(method = "lm")
      +
      +Regression line +

      +Figure 6.5: Regression line +

      +
      +

      When viewed on this plot, the regression line is a visual summary of the relationship between two numerical variables, in our case the outcome variable score and the explanatory variable bty_avg. The positive slope of the blue line is consistent with our observed correlation coefficient of 0.187 suggesting that there is a positive relationship between score and bty_avg. We’ll see later however that while the correlation coefficient is not equal to the slope of this line, they always have the same sign: positive or negative.

      +

      What are the grey bands surrounding the blue line? These are standard error bands, which can be thought of as error/uncertainty bands. Let’s skip this idea for now and suppress these grey bars for now by adding the argument se = FALSE to geom_smooth(method = "lm"). We’ll introduce standard errors in Chapter 8 on sampling, use them for constructing confidence intervals and conducting hypothesis tests in Chapters 9 and 10, and consider them when we revisit regression in Chapter 11.

      +
      ggplot(evals_ch6, aes(x = bty_avg, y = score)) +
      +  geom_point() +
      +  labs(x = "Beauty Score", y = "Teaching Score", 
      +       title = "Relationship of teaching and beauty scores") +
      +  geom_smooth(method = "lm", se = FALSE)
      +
      +Regression line without error bands +

      +Figure 6.6: Regression line without error bands +

      +
      +
      +

      +Learning check +

      +
      +

      (LC6.1) Conduct a new exploratory data analysis with the same outcome variable \(y\) being score but with age as the new explanatory variable \(x\). Remember, this involves three things:

      +
        +
      1. Looking at the raw values.
      2. +
      3. Computing summary statistics of the variables of interest.
      4. +
      5. Creating informative visualizations.
      6. +
      +

      What can you say about the relationship between age and teaching scores based on this exploration?

      +
      + +
      +
      +
      +

      6.1.2 Simple linear regression

      +

      You may recall from secondary school / high school algebra, in general, the equation of a line is \(y = a + bx\), which is defined by two coefficients. Recall we defined this earlier as “quantitative expressions of a specific property of a phenomenon.” These two coefficients are:

      +
        +
      • the intercept coefficient \(a\), or the value of \(y\) when \(x = 0\), and
      • +
      • the slope coefficient \(b\), or the increase in \(y\) for every increase of one in \(x\).
      • +
      +

      However, when defining a line specifically for regression, like the blue regression line in Figure 6.6, we use slightly different notation: the equation of the regression line is \(\widehat{y} = b_0 + b_1 \cdot x\) where

      +
        +
      • the intercept coefficient is \(b_0\), or the value of \(\widehat{y}\) when \(x=0\), and
      • +
      • the slope coefficient \(b_1\), or the increase in \(\widehat{y}\) for every increase of one in \(x\).
      • +
      +

      Why do we put a “hat” on top of the \(y\)? It’s a form of notation commonly used in regression, which we’ll introduce in the next Subsection 6.1.3 when we discuss fitted values. For now, let’s ignore the hat and treat the equation of the line as you would from secondary school / high school algebra recognizing the slope and the intercept. We know looking at Figure 6.6 that the slope coefficient corresponding to bty_avg should be positive. Why? Because as bty_avg increases, professors tend to roughly have larger teaching evaluation scores. However, what are the specific values of the intercept and slope coefficients? Let’s not worry about computing these by hand, but instead let the computer do the work for us. Specifically let’s use R!

      +

      Let’s get the value of the intercept and slope coefficients by outputting something called the linear regression table. We will fit the linear regression model to the data using the lm() function and save this to score_model. lm stands for “linear model”, given that we are dealing with lines. When we say “fit”, we are saying find the best fitting line to this data.

      +

      The lm() function that “fits” the linear regression model is typically used as lm(y ~ x, data = data_frame_name) where:

      +
        +
      • y is the outcome variable, followed by a tilde (~). This is likely the key to the left of “1” on your keyboard. In our case, y is set to score.
      • +
      • x is the explanatory variable. In our case, x is set to bty_avg. We call the combination y ~ x a model formula. Recall the use of this notation when we computed the correlation coefficient using the get_correlation() function in Subsection 6.1.1.
      • +
      • data_frame_name is the name of the data frame that contains the variables y and x. In our case, data_frame_name is the evals_ch6 data frame.
      • +
      +
      score_model <- lm(score ~ bty_avg, data = evals_ch6)
      +score_model
      +
      
      +Call:
      +lm(formula = score ~ bty_avg, data = evals_ch6)
      +
      +Coefficients:
      +(Intercept)      bty_avg  
      +     3.8803       0.0666  
      +

      This output is telling us that the Intercept coefficient \(b_0\) of the regression line is 3.8803 and the slope coefficient for by_avg is 0.0666. Therefore the blue regression line in Figure 6.6 is

      +

      \[\widehat{\text{score}} = b_0 + b_{\text{bty_avg}} \cdot\text{bty_avg} = 3.8803 + 0.0666\cdot\text{ bty_avg}\]

      +

      where

      +
        +
      • The intercept coefficient \(b_0 = 3.8803\) means for instructors that had a hypothetical beauty score of 0, we would expect them to have on average a teaching score of 3.8803. In this case however, while the intercept has a mathematical interpretation when defining the regression line, there is no practical interpretation since score is an average of a panel of 6 students’ ratings from 1 to 10, a bty_avg of 0 would be impossible. Furthermore, no instructors had a beauty score anywhere near 0 in this data.
      • +
      • Of more interest is the slope coefficient associated with bty_avg: \(b_{\text{bty avg}} = +0.0666\). This is a numerical quantity that summarizes the relationship between the outcome and explanatory variables. Note that the sign is positive, suggesting a positive relationship between beauty scores and teaching scores, meaning as beauty scores go up, so also do teaching scores go up. The slope’s precise interpretation is:

        +
        +

        For every increase of 1 unit in bty_avg, there is an associated increase of, on average, 0.0666 units of score.

        +
      • +
      +

      Such interpretations need be carefully worded:

      +
        +
      • We only stated that there is an associated increase, and not necessarily a causal increase. For example, perhaps it’s not that beauty directly affects teaching scores, but instead individuals from wealthier backgrounds tend to have had better education and training, and hence have higher teaching scores, but these same individuals also have higher beauty scores. Avoiding such reasoning can be summarized by the adage “correlation is not necessarily causation.” In other words, just because two variables are correlated, it doesn’t mean one directly causes the other. We discuss these ideas more in Subsection 6.3.2.
        +
      • +
      • We say that this associated increase is on average 0.0666 units of teaching score and not that the associated increase is exactly 0.0666 units of score across all values of bty_avg. This is because the slope is the average increase across all points as shown by the regression line in Figure 6.6.
      • +
      +

      Now that we’ve learned how to compute the equation for the blue regression line in Figure 6.6 and interpreted all its terms, let’s take our modeling one step further. This time after fitting the model using the lm(), let’s get something called the regression table using the get_regression_table() function from the moderndive package:

      +
      # Fit regression model:
      +score_model <- lm(score ~ bty_avg, data = evals_ch6)
      +# Get regression table:
      +get_regression_table(score_model)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.2: Linear regression table
      termestimatestd_errorstatisticp_valuelower_ciupper_ci
      intercept3.8800.07650.9603.7314.030
      bty_avg0.0670.0164.0900.0350.099
      +

      Note how we took the output of the model fit saved in score_model and used it as an input to the subsequent get_regression_table() function. The output now looks like a table: in fact it is a data frame. The values of the intercept and slope of 3.880 and 0.0666 are now in the estimate column. But what are the remaining 5 columns: std_error, statistic, p_value, lower_ci and upper_ci? What do they tell us? They tell us about both the statistical significance and practical significance of our model results. You can think of this loosely as the “meaningfulness” of the results from a statistical perspective.

      +

      We are going to put aside these ideas for now and revisit them in Chapter 11 on (statistical) inference for regression, after we’ve had a chance to cover:

      +
        +
      • Standard errors in Chapter 8 (std_error)
      • +
      • Confidence intervals in Chapter 9 (lower_ci and upper_ci)
      • +
      • Hypothesis testing in Chapter 10 (statistic and p_value).
      • +
      +

      For now, we’ll only focus on the term and estimate columns of any regression table.

      +

      The get_regression_table() from the moderndive is an example of what’s known as a wrapper function in computer programming, which takes other pre-existing functions and “wraps” them into a single function. This concept is illustrated in Figure 6.7.

      +
      +The concept of a 'wrapper' function. +

      +Figure 6.7: The concept of a ‘wrapper’ function. +

      +
      +

      So all you need to worry about is the what the inputs look like and what the outputs look like; you leave all the other details “under the hood of the car.” In our regression modeling example, the get_regression_table() has

      +
        +
      • Input: A saved lm() linear regression
      • +
      • Output: A data frame with information on the intercept and slope of the regression line.
      • +
      +

      If you’re interested in learning more about the get_regression_table() function’s construction and thinking, see Subsection 6.3.4 below.

      +
      +

      +Learning check +

      +
      +

      (LC6.2) Fit a new simple linear regression using lm(score ~ age, data = evals_ch6) where age is the new explanatory variable \(x\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above?

      +
      + +
      +
      +
      +

      6.1.3 Observed/fitted values and residuals

      +

      We just saw how to get the value of the intercept and the slope of the regression line from the regression table generated by get_regression_table(). Now instead, say we want information on individual points. In this case, we focus on one of the \(n = 463\) instructors in this dataset, corresponding to a single row of evals_ch6.

      +

      For example, say we are interested in the 21st instructor in this dataset:

      + + + + + + + + + + + + + + + + +
      Table 6.3: Data for 21st instructor
      scorebty_avgage
      4.97.3331
      +

      What is the value on the blue line corresponding to this instructor’s bty_avg of 7.333? In Figure 6.8 we mark three values in particular corresponding to this instructor.

      +
        +
      • Red circle: This is the observed value \(y\) = 4.9 and corresponds to this instructor’s actual teaching score.
      • +
      • Red square: This is the fitted value \(\widehat{y}\) and corresponds to the value on the regression line for \(x\) = 7.333. This value is computed using the intercept and slope in the regression table above: \[\widehat{y} = b_0 + b_1 \cdot x = 3.88 + 0.067 * 7.333 = 4.369\]
      • +
      • Blue arrow: The length of this arrow is the residual and is computed by subtracting the fitted value \(\widehat{y}\) from the observed value \(y\). The residual can be thought of as the error or “lack of fit” of the regression line. In the case of this instructor, it is \(y - \widehat{y}\) = 4.9 - 4.369 = 0.531. In other words, the model was off by 0.531 teaching score units for this instructor.
      • +
      +
      +Example of observed value, fitted value, and residual +

      +Figure 6.8: Example of observed value, fitted value, and residual +

      +
      +

      What if we want both

      +
        +
      1. the fitted value \(\widehat{y} = b_0 + b_1 \cdot x\) and
      2. +
      3. the residual \(y - \widehat{y}\)
      4. +
      +

      not only the 21st instructor but for all 463 instructors in the study? Recall that each instructor corresponds to one of the 463 rows in the evals_ch6 data frame and also one of the 463 points in the regression plot in Figure 6.6.

      +

      We could repeat the above calculations by hand 463 times, but that would be tedious and time consuming. Instead, let’s use the get_regression_points() function that we’ve included in the moderndive R package. Note that in the table below we only present the results for the 21st through the 24th instructors.

      +
      regression_points <- get_regression_points(score_model)
      +regression_points
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.4: Regression points (for only 21st through 24th instructor)
      IDscorebty_avgscore_hatresidual
      214.97.334.370.531
      224.67.334.370.231
      234.57.334.370.131
      244.45.504.250.153
      +

      Just as with the get_regression_table() function, the inputs to the get_regression_points() function are the same, however the outputs are different. Let’s inspect the individual columns:

      +
        +
      • The score column represents the observed value of the outcome variable \(y\).
      • +
      • The bty_avg column represents the values of the explanatory variable \(x\).
      • +
      • The score_hat column represents the fitted values \(\widehat{y}\).
      • +
      • The residual column represents the residuals \(y - \widehat{y}\).
      • +
      +

      get_regression_points() is another example of a wrapper function we described in Figure 6.7. If you’re curious about this function as well, check out Subsection 6.3.4.

      +

      Just as we did for the 21st instructor in the evals_ch6 dataset (in the first row of the table above), let’s repeat the above calculations for the 24th instructor in the evals_ch6 dataset (in the fourth row of the table above):

      +
        +
      • score = 4.4 is the observed value \(y\) for this instructor.
      • +
      • bty_avg = 5.50 is the value of the explanatory variable \(x\) for this instructor.
      • +
      • score_hat = 4.25 = 3.88 + 0.067 * \(x\) = 3.88 + 0.067 * 5.50 is the fitted value \(\widehat{y}\) for this instructor.
      • +
      • residual = 0.153 = 4.4 - 4.25 is the value of the residual for this instructor. In other words, the model was off by 0.153 teaching score units for this instructor.
      • +
      +

      More development of this idea appears in Section 6.3.3 and we encourage you to read that section after you investigate residuals.

      +
      +
      +

      6.1.4 Residual analysis

      +

      Recall the residuals can be thought of as the error or the “lack-of-fit” between the observed value \(y\) and the fitted value \(\widehat{y}\) on the blue regression line in Figure 6.6. Ideally when we fit a regression model, we’d like there to be no systematic pattern to these residuals. We’ll be more specific as to what we mean by no systematic pattern when we see Figure 6.10 below, but let’s keep this notion imprecise for now. Investigating any such patterns is known as residual analysis and is the theme of this section.

      +

      We’ll perform our residual analysis in two ways:

      +
        +
      1. Creating a scatterplot with the residuals on the \(y\)-axis and the original explanatory variable \(x\) on the \(x\)-axis.
      2. +
      3. Creating a histogram of the residuals, thereby showing the distribution of the residuals.
      4. +
      +

      First, recall in Figure 6.8 above we created a scatterplot where

      +
        +
      • on the vertical axis we had the teaching score \(y\),
      • +
      • on the horizontal axis we had the beauty score \(x\), and
      • +
      • the blue arrow represented the residual for one particular instructor.
      • +
      +

      Instead, in Figure 6.9 below, let’s create a scatterplot where

      +
        +
      • On the vertical axis we have the residual \(y-\widehat{y}\) instead
      • +
      • On the horizontal axis we have the beauty score \(x\) as before:
      • +
      +
      ggplot(regression_points, aes(x = bty_avg, y = residual)) +
      +  geom_point() +
      +  labs(x = "Beauty Score", y = "Residual") +
      +  geom_hline(yintercept = 0, col = "blue", size = 1)
      +
      +Plot of residuals over beauty score +

      +Figure 6.9: Plot of residuals over beauty score +

      +
      +

      You can think of Figure 6.9 as Figure 6.8 but with the blue line flattened out to \(y=0\). Does it seem like there is no systematic pattern to the residuals? This question is rather qualitative and subjective in nature, thus different people may respond with different answers to the above question. However, it can be argued that there isn’t a drastic pattern in the residuals.

      +

      Let’s now get a little more precise in our definition of no systematic pattern in the residuals. Ideally, the residuals should behave randomly. In addition,

      +
        +
      1. the residuals should be on average 0. In other words, sometimes the regression model will make a positive error in that \(y - \widehat{y} > 0\), sometimes the regression model will make a negative error in that \(y - \widehat{y} < 0\), but on average the error is 0.
      2. +
      3. Further, the value and spread of the residuals should not depend on the value of \(x\).
      4. +
      +

      In Figure 6.10 below, we display some hypothetical examples where there are drastic patterns to the residuals. In Example 1, the value of the residual seems to depend on \(x\): the residuals tend to be positive for small and large values of \(x\) in this range, whereas values of \(x\) more in the middle tend to have negative residuals. In Example 2, while the residuals seem to be on average 0 for each value of \(x\), the spread of the residuals varies for different values of \(x\); this situation is known as heteroskedasticity.

      +
      +Examples of less than ideal residual patterns +

      +Figure 6.10: Examples of less than ideal residual patterns +

      +
      +

      The second way to perform a residual analysis is to look at the histogram of the residuals:

      +
      ggplot(regression_points, aes(x = residual)) +
      +  geom_histogram(binwidth = 0.25, color = "white") +
      +  labs(x = "Residual")
      +
      +Histogram of residuals +

      +Figure 6.11: Histogram of residuals +

      +
      +

      This histogram seems to indicate that we have more positive residuals than negative. Since the residual \(y-\widehat{y}\) is positive when \(y > \widehat{y}\), it seems our fitted teaching score from the regression model tends to underestimate the true teaching score. This histogram has a slight left-skew in that there is a long tail on the left. Another way to say this is this data exhibits a negative skew. Is this a problem? Again, there is a certain amount of subjectivity in the response. In the authors’ opinion, while there is a slight skew/pattern to the residuals, it isn’t a large concern. On the other hand, others might disagree with our assessment. Here are examples of an ideal and less than ideal pattern to the residuals when viewed in a histogram:

      +
      +Examples of ideal and less than ideal residual patterns +

      +Figure 6.12: Examples of ideal and less than ideal residual patterns +

      +
      +

      In fact, we’ll see later on that we would like the residuals to be normally distributed with +mean 0. In other words, be bell-shaped and centered at 0! While this requirement and residual analysis in general may seem to some of you as not being overly critical at this point, we’ll see later after when we cover inference for regression in Chapter 11 that for the last five columns of the regression table from earlier (std error, statistic, p_value,lower_ci, and upper_ci) to have valid interpretations, the above three conditions should roughly hold.

      +
      +

      +Learning check +

      +
      +

      (LC6.3) Continuing with our regression using age as the explanatory variable and teaching score as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 463 instructors. Perform a residual analysis and look for any systematic patterns in the residuals. Ideally, there should be little to no pattern.

      +
      + +
      +
      +
      +
      +

      6.2 One categorical explanatory variable

      +

      It’s an unfortunate truth that life expectancy is not the same across various countries in the world; there are a multitude of factors that are associated with how long people live. International development agencies are very interested in studying these differences in the hope of understanding where governments should allocate resources to address this problem. In this section, we’ll explore differences in life expectancy in two ways:

      +
        +
      1. Differences between continents: Are there significant differences in life expectancy, on average, between the five continents of the world: Africa, the Americas, Asia, Europe, and Oceania?
      2. +
      3. Differences within continents: How does life expectancy vary within the world’s five continents? For example, is the spread of life expectancy among the countries of Africa larger than the spread of life expectancy among the countries of Asia?
      4. +
      +

      To answer such questions, we’ll study the gapminder dataset in the gapminder package. Recall we mentioned this dataset in Subsection 3.1.2 when we first studied the “Grammar of Graphics” introduced in Figure 3.1. This dataset has international development statistics such as life expectancy, GDP per capita, and population by country (\(n\) = 142) for 5-year intervals between 1952 and 2007.

      +

      We’ll use this data for linear regression again, but note that our explanatory variable \(x\) is now categorical, and not numerical like when we covered simple linear regression in Section 6.1. More precisely, we have:

      +
        +
      1. A numerical outcome variable \(y\). In this case, life expectancy.
      2. +
      3. A single categorical explanatory variable \(x\), In this case, the continent the country is part of.
      4. +
      +

      When the explanatory variable \(x\) is categorical, the concept of a “best-fitting” line is a little different than the one we saw previously in Section 6.1 where the explanatory variable \(x\) was numerical. We’ll study these differences shortly in Subsection 6.2.2, but first we conduct our exploratory data analysis.

      +
      +

      6.2.1 Exploratory data analysis

      +

      Let’s load the gapminder data and filter() for only observations in 2007. Next we select() only the variables we’ll need along with gdpPercap, which is each country’s gross domestic product per capita (GDP). GDP is a rough measure of that country’s economic performance. (This will be used for the upcoming Learning Check). Lastly, we save this in a data frame with name gapminder2007:

      +
      library(gapminder)
      +gapminder2007 <- gapminder %>%
      +  filter(year == 2007) %>% 
      +  select(country, continent, lifeExp, gdpPercap)
      +

      You should look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. In Table 6.5 we only show 5 randomly selected countries out of 142:

      +
      View(gapminder2007)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.5: Random sample of 5 countries
      countrycontinentlifeExpgdpPercap
      Slovak RepublicEurope74.718678
      IsraelAsia80.725523
      BulgariaEurope73.010681
      TanzaniaAfrica52.51107
      MyanmarAsia62.1944
      +
      glimpse(gapminder2007)
      +
      Observations: 142
      +Variables: 4
      +$ country   <fct> Afghanistan, Albania, Algeria, Angola, Argentina, Australia…
      +$ continent <fct> Asia, Europe, Africa, Africa, Americas, Oceania, Europe, As…
      +$ lifeExp   <dbl> 43.8, 76.4, 72.3, 42.7, 75.3, 81.2, 79.8, 75.6, 64.1, 79.4,…
      +$ gdpPercap <dbl> 975, 5937, 6223, 4797, 12779, 34435, 36126, 29796, 1391, 33…
      +

      We see that the variable continent is indeed categorical, as it is encoded as fct which stands for “factor.” This is R’s way of storing categorical variables. Let’s once again apply the skim() function from the skimr package to our two variables of interest: continent and lifeExp:

      +
      gapminder2007 %>% 
      +  select(continent, lifeExp) %>% 
      +  skim()
      +
      Skim summary statistics
      + n obs: 142 
      + n variables: 2 
      +
      +── Variable type:factor ──────
      +  variable missing complete   n n_unique                         top_counts
      + continent       0      142 142        5 Afr: 52, Asi: 33, Eur: 30, Ame: 25
      + ordered
      +   FALSE
      +
      +── Variable type:numeric ─────
      + variable missing complete   n  mean    sd    p0   p25   p50   p75 p100
      +  lifeExp       0      142 142 67.01 12.07 39.61 57.16 71.94 76.41 82.6
      +     hist
      + ▂▂▂▂▂▃▇▇
      +

      The output now reports summaries for categorical variables (the variable type: factor) separately from the numerical variables. For the categorical variable continent it now reports:

      +
        +
      • missing, complete, n as before which are the number of missing, complete, and total number of values.
      • +
      • n_unique: The unique number of levels to this variable, corresponding to Africa, Asia, Americas, Europe, and Oceania
      • +
      • top_counts: In this case the top four counts: Africa has 52 entries each corresponding to a country, Asia has 33, Europe has 30, and Americans has 25. Not displayed is Oceania with 2 countries
      • +
      • ordered: Reporting whether the variable is “ordinal.” In this case, it is not ordered.
      • +
      +

      Given that the global median life expectancy is 71.94, half of the world’s countries (71 countries) will have a life expectancy less than 71.94. Further, half will have a life expectancy greater than this value. The mean life expectancy of 67.01 is lower however. Why are these two values different? Let’s look at a histogram of lifeExp in Figure 6.13 to see why.

      +
      +Histogram of Life Expectancy in 2007 +

      +Figure 6.13: Histogram of Life Expectancy in 2007 +

      +
      +

      We see that this data is left-skewed/negatively skewed: there are a few countries with very low life expectancies that are bringing down the mean life expectancy. However, the median is less sensitive to the effects of such outliers. Hence the median is greater than the mean in this case. Let’s proceed by comparing median and mean life expectancy between continents by adding a group_by(continent) to the above code:

      +
      lifeExp_by_continent <- gapminder2007 %>%
      +  group_by(continent) %>%
      +  summarize(median = median(lifeExp), mean = mean(lifeExp))
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.6: Life expectancy by continent
      continentmedianmean
      Africa52.954.8
      Americas72.973.6
      Asia72.470.7
      Europe78.677.6
      Oceania80.780.7
      +

      We see now that there are differences in life expectancies between the continents. For example let’s focus on only medians. While the median life expectancy across all \(n = 142\) countries in 2007 was 71.935, the median life expectancy across the \(n =52\) countries in Africa was only 52.927.

      +

      Let’s create a corresponding visualization. One way to compare the life expectancies of countries in different continents would be via a faceted histogram. Recall we saw back in the Data Visualization chapter, specifically Section 3.6, that facets allow us to split a visualization by the different levels of a categorical variable or factor variable. In Figure 6.14, the variable we facet by is continent, which is categorical with five levels, each corresponding to the five continents of the world.

      +
      ggplot(gapminder2007, aes(x = lifeExp)) +
      +  geom_histogram(binwidth = 5, color = "white") +
      +  labs(x = "Life expectancy", y = "Number of countries", 
      +       title = "Life expectancy by continent") +
      +  facet_wrap(~ continent, nrow = 2)
      +
      +Life expectancy in 2007 +

      +Figure 6.14: Life expectancy in 2007 +

      +
      +

      Another way would be via a geom_boxplot where we map the categorical variable continent to the \(x\)-axis and the different life expectancies within each continent on the \(y\)-axis; we do this in Figure 6.15.

      +
      ggplot(gapminder2007, aes(x = continent, y = lifeExp)) +
      +  geom_boxplot() +
      +  labs(x = "Continent", y = "Life expectancy (years)", 
      +       title = "Life expectancy by continent") 
      +
      +Life expectancy in 2007 +

      +Figure 6.15: Life expectancy in 2007 +

      +
      +

      Some people prefer comparing a numerical variable between different levels of a categorical variable, in this case comparing life expectancy between different continents, using a boxplot over a faceted histogram as we can make quick comparisons with single horizontal lines. For example, we can see that even the country with the highest life expectancy in Africa is still lower than all countries in Oceania.

      +

      It’s important to remember however that the solid lines in the middle of the boxes correspond to the medians (i.e. the middle value) rather than the mean (the average). So, for example, if you look at Asia, the solid line denotes the median life expectancy of around 72 years, indicating to us that half of all countries in Asia have a life expectancy below 72 years whereas half of all countries in Asia have a life expectancy above 72 years. Furthermore, note that:

      +
        +
      • Africa and Asia have much more spread/variation in life expectancy as indicated by the interquartile range (the height of the boxes).
      • +
      • Oceania has almost no spread/variation, but this might in large part be due to the fact there are only two countries in Oceania: Australia and New Zealand.
      • +
      +

      Now, let’s start making comparisons of life expectancy between continents. Let’s use Africa as a baseline for comparsion. Why Africa? Only because it happened to be first alphabetically, we could have just as appropriately used the Americas as the baseline for comparison. Using the “eyeball test” (just using our eyes to see if anything stands out), we make the following observations about differences in median life expectancy compared to the baseline of Africa:

      +
        +
      1. The median life expectancy of the Americas is roughly 20 years greater.
      2. +
      3. The median life expectancy of Asia is roughly 20 years greater.
      4. +
      5. The median life expectancy of Europe is roughly 25 years greater.
      6. +
      7. The median life expectancy of Oceania is roughly 27.8 years greater.
      8. +
      +

      Let’s remember these four differences vs Africa corresponding to the Americas, Asia, Europe, and Oceania: 20, 20, 25, 27.8.

      +
      +

      +Learning check +

      +
      +

      (LC6.4) Conduct a new exploratory data analysis with the same explanatory variable \(x\) being continent but with gdpPercap as the new outcome variable \(y\). Remember, this involves three things:

      +
        +
      1. Looking at the raw values
      2. +
      3. Computing summary statistics of the variables of interest.
      4. +
      5. Creating informative visualizations
      6. +
      +

      What can you say about the differences in GDP per capita between continents based on this exploration?

      +
      + +
      +
      +
      +

      6.2.2 Linear regression

      +

      In Subsection 6.1.2 we introduced simple linear regression, which involves modeling the relationship between a numerical outcome variable \(y\) as a function of a numerical explanatory variable \(x\), in our life expectancy example, we now have a categorical explanatory variable \(x\) continent. While we still can fit a regression model, given our categorical explanatory variable we no longer have a concept of a “best-fitting” line, but rather “differences relative to a baseline for comparison.”

      +

      Before we fit our regression model, let’s create a table similar to Table 6.6, but

      +
        +
      1. Report the mean life expectancy for each continent.
      2. +
      3. Report the difference in mean life expectancy relative to Africa’s mean life expectancy of 54.806 in the column “mean vs Africa”; this column is simply the “mean” column minus 54.806.
      4. +
      +

      Think back to your observations from the eyeball test of Figure 6.15 at the end of the last subsection. The column “mean vs Africa” is the same idea of comparing a summary statistic to a baseline for comparison, in this case the countries of Africa, but using means instead of medians.

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.7: Mean life expectancy by continent
      continentmeanmean vs Africa
      Africa54.80.0
      Americas73.618.8
      Asia70.715.9
      Europe77.622.8
      Oceania80.725.9
      +

      Now, let’s use the get_regression_table() function we introduced in Section 6.1.2 to get the regression table for gapminder2007 analysis:

      +
      lifeExp_model <- lm(lifeExp ~ continent, data = gapminder2007)
      +get_regression_table(lifeExp_model)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.8: Linear regression table
      termestimatestd_errorstatisticp_valuelower_ciupper_ci
      intercept54.81.0253.45052.856.8
      continentAmericas18.81.8010.45015.222.4
      continentAsia15.91.659.68012.719.2
      continentEurope22.81.7013.47019.526.2
      continentOceania25.95.334.86015.436.5
      +

      Just as before, we have the term and estimates columns of interest, but unlike before, we now have 5 rows corresponding to 5 outputs in our table: an intercept like before, but also continentAmericas, continentAsia, continentEurope, and continentOceania. What are these values? First, we must describe the equation for fitted value \(\widehat{y}\), which is a little more complicated when the \(x\) explanatory variable is categorical:

      +

      \[ +\begin{align} +\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x) +\end{align} +\]

      +

      Let’s break this down. First, \(\mathbb{1}_{A}(x)\) is what’s known in mathematics as an “indicator function” that takes one of two possible values:

      +

      \[ +\mathbb{1}_{A}(x) = \left\{ +\begin{array}{ll} +1 & \text{if } x \text{ is in } A \\ +0 & \text{if } \text{otherwise} \end{array} +\right. +\]

      +

      In a statistical modeling context this is also known as a “dummy variable”. In our case, let’s consider the first such indicator variable:

      +

      \[ +\mathbb{1}_{\mbox{Amer}}(x) = \left\{ +\begin{array}{ll} +1 & \text{if } \text{country } x \text{ is in the Americas} \\ +0 & \text{otherwise}\end{array} +\right. +\]

      +

      Now let’s interpret the terms in the estimate column of the regression table. First \(b_0 =\) intercept = 54.8 corresponds to the mean life expectancy for countries in Africa, since for country \(x\) in Africa we have the following equation:

      +

      \[ +\begin{align} +\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot 0 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\ +&= 54.8 +\end{align} +\]

      +

      i.e. All four of the indicator variables are equal to 0. Recall we stated earlier that we would treat Africa as the baseline for comparison group. Furthermore, this value corresponds to the group mean life expectancy for all African countries in Table 6.7.

      +

      Next, \(b_{\text{Amer}}\) = continentAmericas = 18.8 is the difference in mean life expectancies of countries in the Americas relative to Africa, or in other words, on average countries in the Americas had life expectancy 18.8 years greater. The fitted value yielded by this equation is:

      +

      \[ +\begin{align} +\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot 1 + 15.9\cdot 0 + 22.8\cdot 0 + 25.9\cdot 0\\ +&= 54.8 + 18.8\\ +&= 72.9 +\end{align} +\]

      +

      i.e. in this case, only the indicator function \(\mathbb{1}_{\mbox{Amer}}(x)\) is equal to 1, but all others are 0. Recall that 72.9 corresponds to the group mean life expectancy for all countries in the Americas in Table 6.7.

      +

      Similarly, \(b_{\text{Asia}}\) = continentAsia = 15.9 is the difference in mean life expectancies of Asian countries relative to Africa countries, or in other words, on average countries in the Asia had life expectancy 18.8 years greater than Africa. The fitted value yielded by this equation is:

      +

      \[ +\begin{align} +\widehat{\text{life exp}} &= b_0 + b_{\text{Amer}}\cdot\mathbb{1}_{\mbox{Amer}}(x) + b_{\text{Asia}}\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ b_{\text{Euro}}\cdot\mathbb{1}_{\mbox{Euro}}(x) + b_{\text{Ocean}}\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot\mathbb{1}_{\mbox{Amer}}(x) + 15.9\cdot\mathbb{1}_{\mbox{Asia}}(x) ++ 22.8\cdot\mathbb{1}_{\mbox{Euro}}(x) + 25.9\cdot\mathbb{1}_{\mbox{Ocean}}(x)\\ +&= 54.8 + 18.8\cdot 0 + 15.9\cdot 1 + 22.8\cdot 0 + 25.9\cdot 0\\ +&= 54.8 + 15.9\\ +&= 70.7 +\end{align} +\]

      +

      i.e. in this case, only the indicator function \(\mathbb{1}_{\mbox{Asia}}(x)\) is equal to 1, but all others are 0. Recall that 70.7 corresponds to the group mean life expectancy for all countries in Asia in Table 6.7. The same logic applies to \(b_{\text{Euro}} = 22.8\) and \(b_{\text{Ocean}} = 25.9\); they correspond to the “offset” in mean life expectancy for countries in Europe and Oceania, relative to the mean life expectancy of the baseline group for comparison of African countries.

      +

      Let’s generalize this idea a bit. If we fit a linear regression model using a categorical explanatory variable \(x\) that has \(k\) levels, a regression model will return an intercept and \(k - 1\) “slope” coefficients. When \(x\) is a numerical explanatory variable the interpretation is of a “slope” coefficient, but when \(x\) is categorical the meaning is a little trickier. They are offsets relative to the baseline.

      +

      In our case, since there are \(k = 5\) continents, the regression model returns an intercept corresponding to the baseline for comparison Africa and \(k - 1 = 4\) slope coefficients corresponding to the Americas, Asia, Europe, and Oceania. Africa was chosen as the baseline by R for no other reason than it is first alphabetically of the 5 continents. You can manually specify which continent to use as baseline instead of the default choice of whichever comes first alphabetically, but we leave that to a more advanced course. (The forcats package is particularly nice for doing this and we encourage you to explore using it.)

      +
      +

      +Learning check +

      +
      +

      (LC6.5) Fit a new linear regression using lm(gdpPercap ~ continent, data = gapminder2007) where gdpPercap is the new outcome variable \(y\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above?

      +
      + +
      +
      +
      +

      6.2.3 Observed/fitted values and residuals

      +

      Recall in Subsection 6.1.3 when we had a numerical explanatory variable \(x\), we defined:

      +
        +
      1. Observed values \(y\), or the observed value of the outcome variable
      2. +
      3. Fitted values \(\widehat{y}\), or the value on the regression line for a given \(x\) value
      4. +
      5. Residuals \(y - \widehat{y}\), or the error between the observed value and the fitted value
      6. +
      +

      What do fitted values \(\widehat{y}\) and residuals \(y - \widehat{y}\) correspond to when the explanatory variable \(x\) is categorical? Let’s investigate these values for the first 10 countries in the gapminder2007 dataset:

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.9: First 10 out of 142 countries
      countrycontinentlifeExpgdpPercap
      AfghanistanAsia43.8975
      AlbaniaEurope76.45937
      AlgeriaAfrica72.36223
      AngolaAfrica42.74797
      ArgentinaAmericas75.312779
      AustraliaOceania81.234435
      AustriaEurope79.836126
      BahrainAsia75.629796
      BangladeshAsia64.11391
      BelgiumEurope79.433693
      +

      Recall the get_regression_points() function we used in Subsection 6.1.3 to return

      +
        +
      • the observed value of the outcome variable,
      • +
      • all explanatory variables,
      • +
      • fitted values, and
      • +
      • residuals for all points in the regression. Recall that each “point”. In this case, each row corresponds to one of 142 countries in the gapminder2007 dataset. They are also the 142 observations used to construct the boxplots in Figure 6.15.
      • +
      +
      regression_points <- get_regression_points(lifeExp_model)
      +regression_points
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.10: Regression points (First 10 out of 142 countries)
      IDlifeExpcontinentlifeExp_hatresidual
      143.8Asia70.7-26.900
      276.4Europe77.6-1.226
      372.3Africa54.817.495
      442.7Africa54.8-12.075
      575.3Americas73.61.712
      681.2Oceania80.70.515
      779.8Europe77.62.180
      875.6Asia70.74.907
      964.1Asia70.7-6.666
      1079.4Europe77.61.792
      +

      Notice

      +
        +
      • The fitted values lifeExp_hat \(\widehat{\text{lifeexp}}\). Countries in Africa have the +same fitted value of 54.8, which is the mean life expectancy of Africa. Countries in Asia have the same fitted value of 70.7, which is the mean life +expectancy of Asia. This similarly holds for countries in the Americas, Europe, +and Oceania.
      • +
      • The residual column is simply \(y - \widehat{y}\) = lifeexp - lifeexp_hat. +These values can be interpreted as that particular country’s deviation from the +mean life expectancy of the respective continent’s mean. For example, the first +row of this dataset corresponds to Afghanistan, and the residual of +\(-26.9 = 43.8 - 70.7\) is Afghanistan’s mean life expectancy minus the mean life +expectancy of all Asian countries.
      • +
      +
      +
      +

      6.2.4 Residual analysis

      +

      Recall our discussion on residuals from Section 6.1.4 where our goal was to investigate whether or not there was a systematic pattern to the residuals. Ideally since residuals can be thought of as error, there should be no such pattern. While there are many ways to do such residual analysis, we focused on two approaches based on visualizations.

      +
        +
      1. A plot with residuals on the vertical axis and the predictor (in this case continent) on the horizontal axis
      2. +
      3. A histogram of all residuals
      4. +
      +

      First, let’s plot the residuals versus continent in Figure 6.16, but also let’s plot all 142 points with a little horizontal random jitter by setting the width = 0.1 parameter in geom_jitter():

      +
      ggplot(regression_points, aes(x = continent, y = residual)) +
      +  geom_jitter(width = 0.1) + 
      +  labs(x = "Continent", y = "Residual") +
      +  geom_hline(yintercept = 0, col = "blue")
      +
      +Plot of residuals over continent +

      +Figure 6.16: Plot of residuals over continent +

      +
      +

      We observe

      +
        +
      1. There seems to be a rough balance of both positive and negative residuals for all 5 continents.
      2. +
      3. However, there is one clear outlier in Asia. It has the smallest residual, +hence also has the smallest life expectancy in Asia.
      4. +
      +

      Let’s investigate the 5 countries in Asia with the shortest life expectancy:

      +
      gapminder2007 %>%
      +  filter(continent == "Asia") %>%
      +  arrange(lifeExp)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 6.11: Countries in Asia with shortest life expectancy
      countrycontinentlifeExpgdpPercap
      AfghanistanAsia43.8975
      IraqAsia59.54471
      CambodiaAsia59.71714
      MyanmarAsia62.1944
      Yemen, Rep.Asia62.72281
      +

      This was the earlier identified residual for Afghanistan of -26.9. Unfortunately +given recent geopolitical turmoil, individuals who live in Afghanistan and, in particular in 2007, have a +drastically lower life expectancy.

      +

      Second, let’s look at a histogram of all 142 values of +residuals in Figure 6.17. In this case, the residuals form a +rather nice bell-shape, although there are a couple of very low and very high +values at the tails. As we said previously, searching for patterns in residuals +can be somewhat subjective, but ideally we hope there are no “drastic” patterns.

      +
      ggplot(regression_points, aes(x = residual)) +
      +  geom_histogram(binwidth = 5, color = "white") +
      +  labs(x = "Residual")
      +
      +Histogram of residuals +

      +Figure 6.17: Histogram of residuals +

      +
      +
      +

      +Learning check +

      +
      +

      (LC6.6) Continuing with our regression using gdpPercap as the outcome variable and continent as the explanatory variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 142 countries in 2007 and perform a residual analysis to look for any systematic patterns in the residuals. Is there a pattern? Please keep in mind that these types of questions are somewhat subjective and different people will most likely have different answers. The focus should be on being able to justify the conclusions made.

      +
      + +
      +
      +
      + +
      +

      6.4 Conclusion

      +

      In this chapter, you’ve seen what we call “basic regression” when you only have one explanatory variable. In Chapter 7, we’ll study multiple regression where we have more than one explanatory variable! In particular, we’ll see why we’ve been conducting the residual analyses from Subsections {#model1residuals} and {#model2residuals}. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, lower_ci and upper_ci (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation. Again, don’t worry for now if you don’t understand what these terms mean. After the next chapter on multiple regression, we’ll dive in!

      +
      +

      6.4.1 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/7-hypothesis-testing.html b/docs/previous_versions/v0.4.0/7-hypothesis-testing.html new file mode 100644 index 000000000..e69de29bb diff --git a/docs/previous_versions/v0.4.0/7-multiple-regression.html b/docs/previous_versions/v0.4.0/7-multiple-regression.html new file mode 100644 index 000000000..d74909907 --- /dev/null +++ b/docs/previous_versions/v0.4.0/7-multiple-regression.html @@ -0,0 +1,1538 @@ + + + + + + + + 7 Multiple Regression | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      7 Multiple Regression

      +

      In Chapter 6 we introduced ideas related to modeling, in particular that the fundamental premise of modeling is to make explicit the relationship between an outcome variable \(y\) and an explanatory/predictor variable \(x\). Recall further the synonyms that we used to also denote \(y\) as the dependent variable and \(x\) as an independent variable or covariate.

      +

      There are many modeling approaches one could take, among the most well-known being linear regression, which was the focus of the last chapter. Whereas in the last chapter we focused solely on regression scenarios where there is only one explanatory/predictor variable, in this chapter, we now focus on modeling scenarios where there is more than one. This case of regression more than one explanatory variable is known as multiple regression. You can imagine when trying to model a particular outcome variable, like teaching evaluation score as in Section 6.1 or life expectancy as in Section 6.2, it would be very useful to incorporate more than one explanatory variable.

      +

      Since our regression models will now consider more than one explanatory/predictor variable, the interpretation of the associated effect of any one explanatory/predictor variables must be made in conjunction with the others. For example, say we are modeling individuals’ incomes as a function of their number of years of education and their parents’ wealth. When interpreting the effect of education on income, one has to consider the effect of their parents’ wealth at the same time, as these two variables are almost certainly related. Make note of this throughout this chapter and as you work on interpreting the results of multiple regression models into the future.

      +
      +

      Needed packages

      +

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). Read Section 2.3 for information on how to install and load R packages.

      +
      library(ggplot2)
      +library(dplyr)
      +library(moderndive)
      +library(ISLR)
      +library(skimr)
      +
      +
      +

      DataCamp

      +

      The approach taken below of using more than one variable of information in models using multiple regression is identical to that taken in ModernDive co-author Albert Y. Kim’s DataCamp course “Modeling with Data in the Tidyverse.” If you’re interested in complementing your learning below in an interactive online environment, click on the image below to access the course. The relevant chapters are Chapter 1 “Introduction to Modeling” and Chapter 3 “Modeling with Multiple Regression”.

      +
      +Drawing +
      +
      +
      +

      7.1 Two numerical explanatory variables

      +

      Let’s now attempt to identify factors that are associated with how much credit card debt an individual will have. The textbook An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani is an intermediate-level textbook on statistical and machine learning freely available here. It has an accompanying R package called ISLR with datasets that the authors use to demonstrate various machine learning methods. One dataset that is frequently used by the authors is the Credit dataset where predictions are made on the credit card balance held by \(n = 400\) credit card holders. These predictions are based on information about them like income, credit limit, and education level. Note that this dataset is not based on actual individuals, it is a simulated dataset used for educational purposes.

      +

      Since no information was provided as to who these \(n\) = 400 individuals are and how they came to be included in this dataset, it will be hard to make any scientific claims based on this data. Recall our discussion from the previous chapter that correlation does not necessarily imply causation. That being said, we’ll still use Credit to demonstrate multiple regression with:

      +
        +
      1. A numerical outcome variable \(y\), in this case credit card balance.
      2. +
      3. Two explanatory variables: +
          +
        1. A first numerical explanatory variable \(x_1\). In this case, their credit limit.
        2. +
        3. A second numerical explanatory variable \(x_2\). In this case, their income (in thousands of dollars).
        4. +
      4. +
      +

      In the forthcoming Learning Checks, we’ll consider a different scenario:

      +
        +
      1. The same numerical outcome variable \(y\): credit card balance.
      2. +
      3. Two new explanatory variables: +
          +
        1. A first numerical explanatory variable \(x_1\): their credit rating.
        2. +
        3. A second numerical explanatory variable \(x_2\): their age.
        4. +
      4. +
      +
      +

      7.1.1 Exploratory data analysis

      +

      Let’s load the Credit data and select() only the needed subset of variables.

      +
      library(ISLR)
      +Credit <- Credit %>%
      +  select(Balance, Limit, Income, Rating, Age)
      +

      Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function. Although in Table 7.1 we only show 5 randomly selected credit card holders out of 400:

      +
      View(Credit)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.1: Random sample of 5 credit card holders
      BalanceLimitIncomeRatingAge
      1425604539.845932
      279330015.126666
      204530880.639457
      10509310180.466567
      15495288.836086
      +
      glimpse(Credit)
      +
      Observations: 400
      +Variables: 5
      +$ Balance <int> 333, 903, 580, 964, 331, 1151, 203, 872, 279, 1350, 1407, 0, …
      +$ Limit   <int> 3606, 6645, 7075, 9504, 4897, 8047, 3388, 7114, 3300, 6819, 8…
      +$ Income  <dbl> 14.9, 106.0, 104.6, 148.9, 55.9, 80.2, 21.0, 71.4, 15.1, 71.1…
      +$ Rating  <int> 283, 483, 514, 681, 357, 569, 259, 512, 266, 491, 589, 138, 3…
      +$ Age     <int> 34, 82, 71, 36, 68, 77, 37, 87, 66, 41, 30, 64, 57, 49, 75, 5…
      +

      Let’s look at some summary statistics, again using the skim() function from the skimr package:

      +
      Credit %>% 
      +  select(Balance, Limit, Income) %>% 
      +  skim()
      +
      Skim summary statistics
      + n obs: 400 
      + n variables: 3 
      +
      +── Variable type:integer ─────
      + variable missing complete   n    mean      sd  p0     p25    p50     p75  p100
      +  Balance       0      400 400  520.01  459.76   0   68.75  459.5  863     1999
      +    Limit       0      400 400 4735.6  2308.2  855 3088    4622.5 5872.75 13913
      +     hist
      + ▇▃▃▃▂▁▁▁
      + ▅▇▇▃▂▁▁▁
      +
      +── Variable type:numeric ─────
      + variable missing complete   n  mean    sd    p0   p25   p50   p75   p100
      +   Income       0      400 400 45.22 35.24 10.35 21.01 33.12 57.47 186.63
      +     hist
      + ▇▃▂▁▁▁▁▁
      +

      We observe for example:

      +
        +
      1. The mean and median credit card balance are $520.01 and $495.50 respectively.
      2. +
      3. 25% of card holders had debts of $68.75 or less.
      4. +
      5. The mean and median credit card limit are $4735.6 and $4622.50 respectively.
      6. +
      7. 75% of these card holders had incomes of $57,470 or less.
      8. +
      +

      Since our outcome variable Balance and the explanatory variables Limit and +Rating are numerical, we can compute the correlation coefficient between pairs +of these variables. First, we could run the get_correlation() command as seen +in Subsection 6.1.1 twice, once for each explanatory variable:

      +
      Credit %>% 
      +  get_correlation(Balance ~ Limit)
      +Credit %>% 
      +  get_correlation(Balance ~ Income)
      +

      Or we can simultaneously compute them by returning a correlation matrix in +Table 7.2. We can read off the correlation coefficient +for any pair of variables by looking them up in the appropriate row/column combination.

      +
      Credit %>%
      +  select(Balance, Limit, Income) %>% 
      +  cor()
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.2: Correlations between credit card balance, credit limit, and income
      BalanceLimitIncome
      Balance1.0000.8620.464
      Limit0.8621.0000.792
      Income0.4640.7921.000
      +

      For example, the correlation coefficient of:

      +
        +
      1. Balance with itself is 1 as we would expect based on the definition of the correlation coefficient.
      2. +
      3. Balance with Limit is 0.862. This indicates a strong positive linear relationship, which makes sense as only individuals with large credit limits can accrue large credit card balances.
      4. +
      5. Balance with Income is 0.464. This is suggestive of another positive linear relationship, although not as strong as the relationship between Balance and Limit.
      6. +
      7. As an added bonus, we can read off the correlation coefficient of the two explanatory variables, Limit and Income of 0.792. In this case, we say there is a high degree of collinearity between these two explanatory variables.
      8. +
      +

      Collinearity (or multicollinearity) is a phenomenon in which one explanatory variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. So in this case, if we knew someone’s credit card Limit and since Limit and Income are highly correlated, we could make a fairly accurate guess as to that person’s Income. Or put loosely, these two variables provided redundant information. For now let’s ignore any issues related to collinearity and press on.

      +

      Let’s visualize the relationship of the outcome variable with each of the two explanatory variables in two separate plots:

      +
      ggplot(Credit, aes(x = Limit, y = Balance)) +
      +  geom_point() +
      +  labs(x = "Credit limit (in $)", y = "Credit card balance (in $)", 
      +       title = "Relationship between balance and credit limit") +
      +  geom_smooth(method = "lm", se = FALSE)
      +  
      +ggplot(Credit, aes(x = Income, y = Balance)) +
      +  geom_point() +
      +  labs(x = "Income (in $1000)", y = "Credit card balance (in $)", 
      +       title = "Relationship between balance and income") +
      +  geom_smooth(method = "lm", se = FALSE)
      +
      +Relationship between credit card balance and credit limit/income +

      +Figure 7.1: Relationship between credit card balance and credit limit/income +

      +
      +

      First, there is a positive relationship between credit limit and balance, since as credit limit increases so also does credit card balance; this is to be expected given the strongly positive correlation coefficient of 0.862. In the case of income, the positive relationship doesn’t appear as strong, given the weakly positive correlation coefficient of 0.464. However the two plots in Figure 7.1 only focus on the relationship of the outcome variable with each of the explanatory variables independently. To get a sense of the joint relationship of all three variables simultaneously through a visualization, let’s display the data in a 3-dimensional (3D) scatterplot, where

      +
        +
      1. The numerical outcome variable \(y\) Balance is on the z-axis (vertical axis)
      2. +
      3. The two numerical explanatory variables form the “floor” axes. In this case +
          +
        1. The first numerical explanatory variable \(x_1\) Income is on of the floor axes.
        2. +
        3. The second numerical explanatory variable \(x_2\) Limit is on the other floor axis.
        4. +
      4. +
      +

      Click on the following image to open an interactive 3D scatterplot in your browser:

      +
      + +
      +

      Previously in Figure 6.6, we plotted a “best-fitting” regression line through a set of points where the numerical outcome variable \(y\) was teaching score and a single numerical explanatory variable \(x\) was bty_avg. What is the analogous concept when we have two numerical predictor variables? Instead of a best-fitting line, we now have a best-fitting plane, which is a 3D generalization of lines which exist in 2D. Click on the following image to open an interactive plot of the regression plane in your browser. Move the image around, zoom in, and think about how this plane generalizes the concept of a linear regression line to three dimensions.

      +
      + +
      +
      +

      +Learning check +

      +
      +

      (LC7.1) Conduct a new exploratory data analysis with the same outcome variable \(y\) being Balance but with Rating and Age as the new explanatory variables \(x_1\) and \(x_2\). Remember, this involves three things:

      +
        +
      1. Looking at the raw values
      2. +
      3. Computing summary statistics of the variables of interest.
      4. +
      5. Creating informative visualizations
      6. +
      +

      What can you say about the relationship between a credit card holder’s balance and their credit rating and age?

      + +
      + +
      +
      +
      +

      7.1.2 Multiple regression

      +

      Just as we did when we had a single numerical explanatory variable \(x\) in Subsection 6.1.2 and when we had a single categorical explanatory variable \(x\) in Subsection 6.2.2, we fit a regression model and obtained the regression table in our two numerical explanatory variable scenario. To fit a regression model and get a table using get_regression_table(), we now use a + to consider multiple explanatory variables. In this case since we want to perform a regression of Limit and Income simultaneously, we input Balance ~ Limit + Income.

      +
      Balance_model <- lm(Balance ~ Limit + Income, data = Credit)
      +get_regression_table(Balance_model)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.3: Multiple regression table
      termestimatestd_errorstatisticp_valuelower_ciupper_ci
      intercept-385.17919.465-19.80-423.446-346.912
      Limit0.2640.00645.000.2530.276
      Income-7.6630.385-19.90-8.420-6.906
      +

      How do we interpret these three values that define the regression plane?

      +
        +
      • Intercept: -$385.18 (rounded to two decimal points to represent cents). The intercept in our case represents the credit card balance for an individual who has both a credit Limit of $0 and Income of $0. In our data however, the intercept has limited practical interpretation as no individuals had Limit or Income values of $0 and furthermore the smallest credit card balance was $0. Rather, it is used to situate the regression plane in 3D space.
      • +
      • Limit: $0.26. Now that we have multiple variables to consider, we have to add +a caveat to our interpretation: taking all other variables in our model into account, for every increase of one unit in credit Limit (dollars), there is an associated increase of on average $0.26 in credit card balance. Note: +
          +
        • Just as we did in Subsection 6.1.2, we are not making any causal statements, only statements relating to the association between credit limit and balance
        • +
        • We need to preface our interpretation of the associated effect of Limit with the statement “taking all other variables into account”, in this case Income, to emphasize that we are now jointly interpreting the associated effect of multiple explanatory variables in the same model and not in isolation.
        • +
      • +
      • Income: -$7.66. Similarly, taking all other variables into account, for every increase of one unit in Income (in other words, $1000 in income), there is an associated decrease of on average $7.66 in credit card balance.
      • +
      +

      However, recall in Figure 7.1 that when considered separately, both Limit and Income had positive relationships with the outcome variable Balance. As card holders’ credit limits increased their credit card balances tended to increase as well, and a similar relationship held for incomes and balances. In the above multiple regression, however, the slope for Income is now -7.66, suggesting a negative relationship between income and credit card balance. What explains these contradictory results?

      +

      This is known as Simpson’s Paradox, a phenomenon in which a trend appears in several different groups of data but disappears or reverses when these groups are combined. We expand on this in Subsection 7.3.2 where we’ll look at the relationship between credit Limit and credit card balance but split by different income bracket groups.

      +
      +

      +Learning check +

      +
      +

      (LC7.2) Fit a new simple linear regression using lm(Balance ~ Rating + Age, data = Credit) where Rating and Age are the new numerical explanatory variables \(x_1\) and \(x_2\). Get information about the “best-fitting” line from the regression table by applying the get_regression_table() function. How do the regression results match up with the results from your exploratory data analysis above?

      +
      + +
      +
      +
      +

      7.1.3 Observed/fitted values and residuals

      +

      As we did previously in Table 7.4, let’s unpack the output of the get_regression_points() function for our model for credit card balance for all 400 card holders in the dataset. Recall that each card holder corresponds to one of the 400 rows in the Credit data frame and also for one of the 400 3D points in the 3D scatterplots in Subsection 7.1.1.

      +
      regression_points <- get_regression_points(Balance_model)
      +regression_points
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.4: Regression points (first 5 rows of 400)
      IDBalanceLimitIncomeBalance_hatresidual
      1333360614.9454-120.8
      29036645106.0559344.3
      35807075104.6683-103.4
      49649504148.9986-21.7
      5331489755.9481-150.0
      +

      Recall the format of the output:

      +
        +
      • Balance corresponds to \(y\) (the observed value)
      • +
      • Balance_hat corresponds to \(\widehat{y}\) (the fitted value)
      • +
      • residual corresponds to \(y - \widehat{y}\) (the residual)
      • +
      +
      +
      +

      7.1.4 Residual analysis

      +

      Recall in Section 6.1.4, our first residual analysis plot investigated the presence of any systematic pattern in the residuals when we had a single numerical predictor: bty_age. For the Credit card dataset, since we have two numerical predictors, Limit and Income, we must perform this twice:

      +
      ggplot(regression_points, aes(x = Limit, y = residual)) +
      +  geom_point() +
      +  labs(x = "Credit limit (in $)", y = "Residual", title = "Residuals vs credit limit")
      +  
      +ggplot(regression_points, aes(x = Income, y = residual)) +
      +  geom_point() +
      +  labs(x = "Income (in $1000)", y = "Residual", title = "Residuals vs income")
      +
      +Residuals vs credit limit and income +

      +Figure 7.2: Residuals vs credit limit and income +

      +
      +

      In this case, there does appear to be a systematic pattern to the residuals. As the scatter of the residuals around the line \(y=0\) is definitely not consistent. This behavior of the residuals is further evidenced by the histogram of residuals in Figure 7.3. We observe that the residuals have a slight right-skew (recall we say that data is right-skewed, or positively-skewed, if there is a tail to the right). Ideally, these residuals should be bell-shaped around a residual value of 0.

      +
      ggplot(regression_points, aes(x = residual)) +
      +  geom_histogram(color = "white") +
      +  labs(x = "Residual")
      +
      +Relationship between credit card balance and credit limit/income +

      +Figure 7.3: Relationship between credit card balance and credit limit/income +

      +
      +

      Another way to interpret this histogram is that since the residual is computed as \(y - \widehat{y}\) = balance - balance_hat, we have some values where the fitted value \(\widehat{y}\) is very much lower than the observed value \(y\). In other words, we are underestimating certain credit card holders’ balances by a very large amount.

      +
      +

      +Learning check +

      +
      +

      (LC7.3) Continuing with our regression using Rating and Age as the explanatory variables and credit card Balance as the outcome variable, use the get_regression_points() function to get the observed values, fitted values, and residuals for all 400 credit card holders. Perform a residual analysis and look for any systematic patterns in the residuals.

      +
      + +
      +
      +
      +
      +

      7.2 One numerical & one categorical explanatory variable

      +

      Let’s revisit the instructor evaluation data introduced in Section 6.1, where we studied the relationship between instructor evaluation scores and their beauty scores. This analysis suggested that there is a positive relationship between bty_avg and score, in other words as instructors had higher beauty scores, they also tended to have higher teaching evaluation scores. Now let’s say instead of bty_avg we are interested in the numerical explanatory variable \(x_1\) age and furthermore we want to use a second explanatory variable \(x_2\), the (binary) categorical variable gender.

      +

      Note: This study only focused on the gender binary of "male" or "female" when the data was collected and analyzed years ago. It has been tradition to use gender as an “easy” binary variable in the past in statistical analyses. We have chosen to include it here because of the interesting results of the study, but we also understand that a segment of the population is not included in this dichotomous assignment of gender and we advocate for more inclusion in future studies to show representation of groups that do not identify with the gender binary. We now resume our analyses using this evals data and hope that others find these results interesting and worth further exploration.

      +

      Our modeling scenario now becomes

      +
        +
      1. A numerical outcome variable \(y\). As before, instructor evaluation score.
      2. +
      3. Two explanatory variables: +
          +
        1. A numerical explanatory variable \(x_1\): in this case, their age.
        2. +
        3. A categorical explanatory variable \(x_2\): in this case, their binary gender.
        4. +
      4. +
      +
      +

      7.2.1 Exploratory data analysis

      +

      Let’s reload the evals data and select() only the needed subset of variables. Note that these are different than the variables chosen in Chapter 6. Let’s given this the name evals_ch7.

      +
      evals_ch7 <- evals %>%
      +  select(score, age, gender)
      +

      Let’s look at the raw data values both by bringing up RStudio’s spreadsheet viewer and the glimpse() function, although in Table 7.5 we only show 5 randomly selected instructors out of 463:

      +
      View(evals_ch7)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.5: Random sample of 5 instructors
      scoreagegender
      3.634male
      4.943male
      3.347male
      4.433female
      4.760male
      +

      Let’s look at some summary statistics using the skim() function from the skimr package:

      +
      evals_ch7 %>% 
      +  skim()
      +
      Skim summary statistics
      + n obs: 463 
      + n variables: 3 
      +
      +── Variable type:factor ──────
      + variable missing complete   n n_unique                top_counts ordered
      +   gender       0      463 463        2 mal: 268, fem: 195, NA: 0   FALSE
      +
      +── Variable type:integer ─────
      + variable missing complete   n  mean  sd p0 p25 p50 p75 p100     hist
      +      age       0      463 463 48.37 9.8 29  42  48  57   73 ▅▅▅▇▅▇▂▁
      +
      +── Variable type:numeric ─────
      + variable missing complete   n mean   sd  p0 p25 p50 p75 p100     hist
      +    score       0      463 463 4.17 0.54 2.3 3.8 4.3 4.6    5 ▁▁▂▃▅▇▇▆
      +

      Furthermore, let’s compute the correlation between two numerical variables we have score and age. Recall that correlation coefficients only exist between numerical variables. We observe that they are weakly negatively correlated.

      +
      evals_ch7 %>% 
      +  get_correlation(formula = score ~ age)
      +
      # A tibble: 1 x 1
      +  correlation
      +        <dbl>
      +1      -0.107
      +

      In Figure 7.4, we plot a scatterplot of score over age. Given that gender is a binary categorical variable in this study, we can make some interesting tweaks:

      +
        +
      1. We can assign a color to points from each of the two levels of gender: female and male.
      2. +
      3. Furthermore, the geom_smooth(method = "lm", se = FALSE) layer automatically fits a different regression line for each since we have provided color = gender at the top level in ggplot(). This allows for all geom_etries that follow to have the same mapping of aes()thetics to variables throughout the plot.
      4. +
      +
      ggplot(evals_ch7, aes(x = age, y = score, color = gender)) +
      +  geom_jitter() +
      +  labs(x = "Age", y = "Teaching Score", color = "Gender") +
      +  geom_smooth(method = "lm", se = FALSE)
      +
      +Instructor evaluation scores at UT Austin split by gender (jittered) +

      +Figure 7.4: Instructor evaluation scores at UT Austin split by gender (jittered) +

      +
      +

      We notice some interesting trends:

      +
        +
      1. There are almost no women faculty over the age of 60. We can see this by the lack of red dots above 60.
      2. +
      3. Fitting separate regression lines for men and women, we see they have different slopes. We see that the associated effect of increasing age seems to be much harsher for women than men. In other words, as women age, the drop in their teaching score appears to be faster.
      4. +
      +
      +
      +

      7.2.2 Multiple regression: Parallel slopes model

      +

      Much like we started to consider multiple explanatory variables using the + sign in Subsection 7.1.2, let’s fit a regression model and get the regression table. This time we provide the name of score_model_2 to our regression model fit, in so as to not overwrite the model score_model from Section 6.1.2.

      +
      score_model_2 <- lm(score ~ age + gender, data = evals_ch7)
      +get_regression_table(score_model_2)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.6: Regression table
      termestimatestd_errorstatisticp_valuelower_ciupper_ci
      intercept4.4840.12535.790.0004.2384.730
      age-0.0090.003-3.280.001-0.014-0.003
      gendermale0.1910.0523.630.0000.0870.294
      +

      The modeling equation for this scenario is:

      +

      \[ +\begin{align} +\widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 \\ +\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) \\ +\end{align} +\] +where \(\mathbb{1}_{\mbox{is male}}(x)\) is an indicator function for sex == male. In other words, \(\mathbb{1}_{\mbox{is male}}(x)\) equals one if the current observation corresponds to a male professor, and 0 if the current observation corresponds to a female professor. This model can be visualized in Figure 7.5.

      +
      +Instructor evaluation scores at UT Austin by gender: same slope +

      +Figure 7.5: Instructor evaluation scores at UT Austin by gender: same slope +

      +
      +

      We see that:

      +
        +
      • Females are treated as the baseline for comparison for no other reason than “female” is alphabetically earlier than “male.” The \(b_{male} = 0.1906\) is the vertical “bump” that men get in their teaching evaluation scores. Or more precisely, it is the average difference in teaching score +that men get relative to the baseline of women.
      • +
      • Accordingly, the intercepts (which in this case make no sense since no instructor can have an age of 0) are : +
          +
        • for women: \(b_0\) = 4.484
        • +
        • for men: \(b_0 + b_{male}\) = 4.484 + 0.191 = 4.675
        • +
      • +
      • Both men and women have the same slope. In other words, in this model the associated effect of age is the same for men and women. So for every increase of one year in age, there is on average an associated change of \(b_{age}\) = -0.009 (a decrease) in teaching score.
      • +
      +

      But wait, why is Figure 7.5 different than Figure 7.4! What is going on? What we have in the original plot is known as an interaction effect between age and gender. Focusing on fitting a model for each of men and women, we see that the resulting regression lines are different. Thus, gender appears to interact in different ways for men and women with the different values of age.

      +
      +
      +

      7.2.3 Multiple regression: Interaction model

      +

      We say a model has an interaction effect if the associated effect of one variable depends on the value of another variable. These types of models usually prove to be tricky to view on first glance because of their complexity. In this case, the effect of age will depend on the value of gender. Put differently, the effect of age on teaching scores will differ for men and for women, as was suggested by the different slopes for men and women in our visual exploratory data analysis in Figure 7.4.

      +

      Let’s fit a regression with an interaction term. Instead of using the + sign in the enumeration of explanatory variables, we use the * sign. Let’s fit this regression and save it in score_model_3, then we get the regression table using the get_regression_table() function as before.

      +
      score_model_interaction <- lm(score ~ age * gender, data = evals_ch7)
      +get_regression_table(score_model_interaction)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.7: Regression table
      termestimatestd_errorstatisticp_valuelower_ciupper_ci
      intercept4.8830.20523.800.0004.4805.286
      age-0.0180.004-3.920.000-0.026-0.009
      gendermale-0.4460.265-1.680.094-0.9680.076
      age:gendermale0.0140.0062.450.0150.0030.024
      +

      The modeling equation for this scenario is:

      +

      \[ +\begin{align} +\widehat{y} &= b_0 + b_1 \cdot x_1 + b_2 \cdot x_2 + b_3 \cdot x_1 \cdot x_2\\ +\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\ +\end{align} +\]

      +

      Oof, that’s a lot of rows in the regression table output and a lot of terms in the model equation. The fourth term being added on the right hand side of the equation corresponds to the interaction term. Let’s simplify things by considering men and women separately. First, recall that \(\mathbb{1}_{\mbox{is male}}(x)\) equals 1 if a particular observation (or row in evals_ch7) corresponds to a male instructor. In this case, using the values from the regression table the fitted value of \(\widehat{\mbox{score}}\) is:

      +

      \[ +\begin{align} +\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\ +&= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 1 + b_{\mbox{age,male}} \cdot \mbox{age} \cdot 1 \\ +&= \left(b_0 + b_{\mbox{male}}\right) + \left(b_{\mbox{age}} + b_{\mbox{age,male}} \right) \cdot \mbox{age} \\ +&= \left(4.883 + -0.446\right) + \left(-0.018 + 0.014 \right) \cdot \mbox{age} \\ +&= 4.437 -0.004 \cdot \mbox{age} +\end{align} +\]

      +

      Second, recall that \(\mathbb{1}_{\mbox{is male}}(x)\) equals 0 if a particular observation corresponds to a female instructor. Again, using the values from the regression table the fitted value of \(\widehat{\mbox{score}}\) is:

      +

      \[ +\begin{align} +\widehat{\mbox{score}} &= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot \mathbb{1}_{\mbox{is male}}(x) + b_{\mbox{age,male}} \cdot \mbox{age} \cdot \mathbb{1}_{\mbox{is male}}(x) \\ +&= b_0 + b_{\mbox{age}} \cdot \mbox{age} + b_{\mbox{male}} \cdot 0 + b_{\mbox{age,male}}\mbox{age} \cdot 0 \\ +&= b_0 + b_{\mbox{age}} \cdot \mbox{age}\\ +&= 4.883 -0.018 \cdot \mbox{age} +\end{align} +\]

      +

      Let’s summarize these values in a table:

      + + + + + + + + + + + + + + + + + + + + + +
      Table 7.8: Comparison of male and female intercepts and age slopes
      GenderInterceptSlope for age
      Male instructors4.44-0.004
      Female instructors4.88-0.018
      +

      We see that while male instructors have a lower intercept, as they age, they have a less steep associated average decrease in teaching scores: 0.004 teaching score units per year as opposed to -0.018 for women. This is consistent with the different slopes and intercepts of the red and blue regression lines fit in Figure 7.4. Recall our definition of a model having an interaction effect: when the associated effect of one variable, in this case age, depends on the value of another variable, in this case gender.

      +

      But how do we know when it’s appropriate to include an interaction effect? For example, which is the more appropriate model? The regular multiple regression model without an interaction term we saw in Section 7.2.2 or the multiple regression model with the interaction term we just saw? We’ll revisit this question in Chapter 11 on “inference for regression.”

      +
      +
      +

      7.2.4 Observed/fitted values and residuals

      +

      Now say we want to apply the above calculations for male and female instructors for all 463 instructors in the evals_ch7 dataset. As our multiple regression models get more and more complex, computing such values by hand gets more and more tedious. The get_regression_points() function spares us this tedium and returns all fitted values and all residuals. For simplicity, let’s focus only on the fitted interaction model, which is saved in score_model_interaction.

      +
      regression_points <- get_regression_points(score_model_interaction)
      +regression_points
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 7.9: Regression points (first 5 rows of 463)
      IDscoreagegenderscore_hatresidual
      14.736female4.250.448
      24.136female4.25-0.152
      33.936female4.25-0.352
      44.836female4.250.548
      54.659male4.200.399
      +

      Recall the format of the output:

      +
        +
      • score corresponds to \(y\) the observed value
      • +
      • score_hat corresponds to \(\widehat{y} = \widehat{\mbox{score}}\) the fitted value
      • +
      • residual corresponds to the residual \(y - \widehat{y}\)
      • +
      +
      +
      +

      7.2.5 Residual analysis

      +

      As always, let’s perform a residual analysis first with a histogram, which we can facet by gender:

      +
      ggplot(regression_points, aes(x = residual)) +
      +  geom_histogram(binwidth = 0.25, color = "white") +
      +  labs(x = "Residual") +
      +  facet_wrap(~gender)
      +
      +Interaction model histogram of residuals +

      +Figure 7.6: Interaction model histogram of residuals +

      +
      +

      Second, the residuals as compared to the predictor variables:

      +
        +
      • \(x_1\): numerical explanatory/predictor variable of age
      • +
      • \(x_2\): categorical explanatory/predictor variable of gender
      • +
      +
      ggplot(regression_points, aes(x = age, y = residual)) +
      +  geom_point() +
      +  labs(x = "age", y = "Residual") +
      +  geom_hline(yintercept = 0, col = "blue", size = 1) +
      +  facet_wrap(~ gender)
      +
      +Interaction model residuals vs predictor +

      +Figure 7.7: Interaction model residuals vs predictor +

      +
      +
      +
      + +
      +

      7.4 Conclusion

      +
      +

      7.4.1 What’s to come?

      +

      Congratulations! We’re ready to proceed to the third portion of this book: “statistical inference” using a new package called infer. Once we’ve covered Chapters 8 on sampling, 9 on confidence intervals, and 10 on hypothesis testing, we’ll come back to the models we’ve seen in “data modeling” in Chapter 11 on inference for regression. As we said at the end of Chapter 6, we’ll see why we’ve been conducting the residual analyses from Subsections 7.1.4 and 7.2.5. We are actually verifying some very important assumptions that must be met for the std_error (standard error), p_value, conf_low and conf_high (the end-points of the confidence intervals) columns in our regression tables to have valid interpretation.

      +

      Up next:

      +
      + +
      +
      +
      +

      7.4.2 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      + + + +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/8-inference-for-regression.html b/docs/previous_versions/v0.4.0/8-inference-for-regression.html new file mode 100644 index 000000000..e69de29bb diff --git a/docs/previous_versions/v0.4.0/8-sampling.html b/docs/previous_versions/v0.4.0/8-sampling.html new file mode 100644 index 000000000..f30c114c7 --- /dev/null +++ b/docs/previous_versions/v0.4.0/8-sampling.html @@ -0,0 +1,1612 @@ + + + + + + + + 8 Sampling | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      8 Sampling

      +

      In this chapter we kick off the third segment of this book, statistical inference, by learning about sampling. The concepts behind sampling form the basis of confidence intervals and hypothesis testing, which we’ll cover in Chapters 9 and 10 respectively. We will see that the tools that you learned in the data science segment of this book (data visualization, “tidy” data format, and data wrangling) will also play an important role here in the development of your understanding. As mentioned before, the concepts throughout this text all build into a culmination allowing you to “think with data.”

      +
      +

      Needed packages

      +

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages.

      +
      library(dplyr)
      +library(ggplot2)
      +library(moderndive)
      +
      +
      +

      8.1 Introduction to sampling

      +

      Let’s kick off this chapter immediately with an exercise that involves sampling. Imagine you are given a large bowl with 2400 balls that are either red or white. We are interested in the proportion of balls in this bowl that are red, but you don’t have the time to do an exhaustive count. You are also given a “shovel” that you can insert into this bowl…

      +
      +A bowl with 2400 balls +

      +Figure 8.1: A bowl with 2400 balls +

      +
      +

      … and extract a sample of 50 balls:

      +
      +A shovel used to extract a sample of size n = 50 +

      +Figure 8.2: A shovel used to extract a sample of size n = 50 +

      +
      + +
      +

      Inference via sampling

      +

      Why did we go through the trouble of enumerating all the above concepts and terminology?

      +

      The moral of the story:

      +
      +
        +
      • If the sampling of a sample of size \(n\) is done at random, then
      • +
      • The sample is unbiased and representative of the population, thus
      • +
      • Any result based on the sample can generalize to the population, thus
      • +
      • The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest
      • +
      +
      +

      and thus we have inferred about the population based on our sample. In the above example:

      +
      +
        +
      • If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \(n=50\), then
      • +
      • The contents of the shovel will “look like” the contents of the bowl, thus
      • +
      • Any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, thus
      • +
      • The sample proportion \(\widehat{p}\) of the \(n=50\) balls in the shovel that are red is a “good guess” of the true population proportion \(p\) of the \(N=2400\) balls that are red.
      • +
      +
      +

      and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel.

      +

      At this point, you might be saying to yourself: “Big deal, why do we care about this bowl?” As hopefully you’ll soon come to appreciate, this sampling bowl exercise is merely a simulation representing the reality of many important sampling scenarios in a simplified and accessible setting. One in particular sampling scenario is familiar to many: polling. Whether for market research or for political purposes, polls inform much of the world’s decision and opinion making, and understanding the mechanism behind them can better inform you statistical citizenship. We’ll tie-in everything we learn in this chapter with an example relating to a 2013 poll on President Obama’s approval ratings among young adult in Section 8.4.

      +
      +
      +
      +

      8.2 Tactile sampling simulation

      +

      Let’s start by revisiting our tactile sampling illustrating with “sampling bowl” in Figures 8.1 and 8.2. By tactile we mean with your hands and to the touch. We’ll break down the act of tactile sampling from the bowl with the shovel using our newly acquired concepts and terminology relating to sampling. In particular we’ll study how sampling variability affects outcomes, which we’ll illustrate through simulations of repeated sampling. To this end, we’ll be using both the above-mentioned tactile simulation, but also using virtual simulation. By virtual we mean on the computer.

      +
      +

      8.2.1 Using shovel once

      +

      Let’s now view our shovel through the lens of sampling with the following 3-step tactile sampling simulation:

      +

      Step 1: Use the shovel to take a sample of size \(n=50\) balls from the bowl as seen in Fig 8.3.

      +
      +Step 1: Take sample of size $n=50$ +

      +Figure 8.3: Step 1: Take sample of size \(n=50\) +

      +
      +

      Step 2: Pour them into a cup and

      +
        +
      • Count the number that are red then
      • +
      • Compute the sample proportion \(\widehat{p}\) of the \(n=50\) balls that are red
      • +
      +

      as seen in Figure 8.4 below. Note from above there are 18 balls out of \(n=50\) that are red. Thus the sample proportion red \(\widehat{p}\) for this particular sample is thus \(\widehat{p} = 18 / 50 = 0.36\).

      +
      +Step 2: Pour into Red Solo Cup and compute $\widehat{p}$ +

      +Figure 8.4: Step 2: Pour into Red Solo Cup and compute \(\widehat{p}\) +

      +
      +

      Step 3: Mark the sample proportion \(\widehat{p}\) in a hand-drawn histogram, just like our intrepid students are doing in Figure 8.5.

      +
      +Step 3: Mark $\widehat{p}$'s in histogram +

      +Figure 8.5: Step 3: Mark \(\widehat{p}\)’s in histogram +

      +
      +

      Repeat Steps 1-3 a few times: After a few groups of students complete this exercise, let’s draw the resulting histogram by hand. In Figure 8.6 we have the resulting hand-drawn histogram for 10 groups of students.

      +
      +Step 3: Histogram of 10 values of $\widehat{p}$ +

      +Figure 8.6: Step 3: Histogram of 10 values of \(\widehat{p}\) +

      +
      +

      Observe the behavior of the 10 different values of the sample proportion \(\widehat{p}\) in the histogram of their distribution, in particular where the values center and how much they spread out, in other words how much they vary. Note:

      +
        +
      • The lowest value of \(\widehat{p}\) was somewhere between 0.20 and 0.25.
      • +
      • The highest value of \(\widehat{p}\) was somewhere between 0.45 and 0.50.
      • +
      • Five of the sample proportions \(\widehat{p}\) cluster. Five different samples of size \(n=50\) yielded sample proportions \(\widehat{p}\) that were in the range 0.30 to 0.35.
      • +
      +

      Let’s now look at some real-life outcomes of this tactile sampling simulation. We present the actual results for not 10 groups of students, but 33 groups of students below!

      +
      +
      +

      8.2.2 Using shovel 33 times

      +

      All told, 33 groups took samples. In other words, the shovel was used 33 times and 33 values of the sample proportion \(\widehat{p}\) were computed; this data is saved in the tactile_prop_red data frame included in the moderndive package. Let’s display its contents in Table ??. Notice how the replicate column enumerates each of the 33 groups, red_balls is the count of balls in the sample of size \(n=50\) that we red, and prop_red is the sample proportion \(\widehat{p}\) that are red.

      +
      tactile_prop_red
      +View(tactile_prop_red)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      groupreplicatered_ballsprop_red
      Ilyas, Yohan1210.42
      Morgan, Terrance2170.34
      Martin, Thomas3210.42
      Clark, Frank4210.42
      Riddhi, Karina5180.36
      Andrew, Tyler6190.38
      Julia7190.38
      Rachel, Lauren8110.22
      Daniel, Caroline9150.30
      Josh, Maeve10170.34
      Emily, Emily11160.32
      Conrad, Emily12180.36
      Oliver, Erik13170.34
      Isabel, Nam14210.42
      X, Claire15150.30
      Cindy, Kimberly16200.40
      Kevin, James17110.22
      Nam, Isabelle18210.42
      Harry, Yuko19150.30
      Yuki, Eileen20160.32
      Ramses21230.46
      Joshua, Elizabeth, Stanley22150.30
      Siobhan, Jane23180.36
      Jack, Will24160.32
      Caroline, Katie25210.42
      Griffin, Y26180.36
      Kaitlin, Jordan27170.34
      Ella, Garrett28180.36
      Julie, Hailin29150.30
      Katie, Caroline30210.42
      Mallory, Damani, Melissa31210.42
      Katie32160.32
      Francis, Vignesh33190.38
      +

      Using your data visualization skills that you honed in Chapter 3, let’s visualize the distribution of these 33 sample proportions red \(\widehat{p}\) using a histogram with binwidth = 0.05. This visualization is appropriate since prop_red is a numerical variable. This histogram is showing a very particular important type of distribution in statistics: the sampling distribution.

      +
      ggplot(tactile_prop_red, aes(x = prop_red)) +
      +  geom_histogram(binwidth = 0.05, color = "white") +
      +  labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat") 
      +
      +Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50 +

      +Figure 8.7: Sampling distribution of 33 sample proportions based on 33 tactile samples with n=50 +

      +
      +

      Sampling distributions are a specific kind of distribution: distributions of point estimates/sample statistics based on samples of size \(n\) used to estimate an unknown population parameter.

      +

      In the case of the histogram in Figure 8.7, its the distribution of the sample proportion red \(\widehat{p}\) based on \(n=50\) sampled balls from the bowl, for which we want to estimate the unknown population proportion \(p\) of the \(N=2400\) balls that are red. Sampling distributions describe how values of the sample proportion red \(\widehat{p}\) will vary from sample to sample due to sampling variability and thus identify “typical” and “atypical” values of \(\widehat{p}\). For example

      +
        +
      • Obtaining a sample that yields \(\widehat{p} = 0.36\) would be considered typical, common, and plausible since it would in theory occur frequently.
      • +
      • Obtaining a sample that yields \(\widehat{p} = 0.8\) would be considered atypical, uncommon, and implausible since it lies far away from most of the distribution.
      • +
      +

      Let’s now ask ourselves the following questions:

      +
        +
      1. Where is the sampling distribution centered?
      2. +
      3. What is the spread of this sampling distribution?
      4. +
      +

      Recall from Section 5.4 the mean and the standard deviation are two summary statistics that would answer this question:

      +
      tactile_prop_red %>% 
      +  summarize(mean = mean(prop_red), sd = sd(prop_red))
      + + + + + + + + + + + + + +
      meansd
      0.3560.058
      +

      Finally, it’s important to keep in mind:

      +
        +
      1. If the sampling is done in an unbiased and random fashion, in other words we made sure to stir the bowl before we sampled, then the sampling distribution will be guaranteed to be centered at the true unknown population proportion red \(p\), or in other words the true number of balls out of 2400 that are red.
      2. +
      3. The spread of this histogram, as quantified by the standard deviation of 0.058, is called the standard error. It quantifies the variability of our estimates for \(\widehat{p}\). +
          +
        • Note: A large source of confusion. All standard errors are a form of standard deviation, but not all standard deviations are standard errors.
        • +
      4. +
      +
      +
      +
      +

      8.3 Virtual sampling simulation

      +

      Now let’s mimic the above tactile sampling, but with virtual sampling. We’ll resort to virtual sampling because while collecting 33 tactile samples manually is feasible, for large numbers like 1000, things start getting tiresome! That’s where a computer can really help: computers excel at performing mundane tasks repeatedly; think of what accounting software must be like!

      +

      In other words:

      +
        +
      • Instead of considering the tactile bowl shown in Figure 8.1 above and using a tactile shovel to draw samples of size \(n=50\)
      • +
      • Let’s use a virtual bowl saved in a computer and use R’s random number generator as a virtual shovel to draw samples of size \(n=50\)
      • +
      +

      First, we describe our virtual bowl. In the moderndive package, we’ve included a data frame called bowl that has 2400 rows corresponding to the \(N=2400\) balls in the physical bowl. Run View(bowl) in RStudio to convince yourselves that bowl is indeed a virtual version of the tactile bowl in the previous section.

      +
      bowl
      +
      # A tibble: 2,400 x 2
      +   ball_ID color
      +     <int> <chr>
      + 1       1 white
      + 2       2 white
      + 3       3 white
      + 4       4 red  
      + 5       5 white
      + 6       6 white
      + 7       7 red  
      + 8       8 white
      + 9       9 red  
      +10      10 white
      +# … with 2,390 more rows
      +

      Note that the balls are not actually marked with numbers; the variable ball_ID is merely used as an identification variable for each row of bowl. Recall our previous discussion on identification variables in Subsection 4.2.2 in the “Data Tidying” Chapter 4.

      +

      Next, we describe our virtual shovel: the rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \(n\).

      +
      +

      8.3.1 Using shovel once

      +

      The rep_sample_n() function included in the moderndive package where rep_sample_n() indicates that we are taking repeated/replicated samples of size \(n\). Let’s perform the virtual analogue of tactilely inserting the shovel only once into the bowl and extracting a sample of size \(n=50\). In the table below we only show results about the first 10 sampled balls out of 50.

      +
      virtual_shovel <- bowl %>% 
      +  rep_sample_n(size = 50)
      +View(virtual_shovel)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      Table 8.1: First 10 sampled balls of 50 in virtual sample
      replicateball_IDcolor
      12079red
      11076white
      11691red
      11687red
      11434white
      1954white
      1483white
      11520white
      12060red
      11682white
      +

      Looking at all 50 rows of virtual_shovel in the spreadsheet viewer that pops up after running View(virtual_shovel) in RStudio, the ball_ID variable seems to suggest that we do indeed have a random sample of \(n=50\) balls. However, what does the replicate variable indicate, where in this case it’s equal to 1 for all 50 rows? We’ll see in a minute. First let’s compute both the number of balls red and the proportion red out of \(n=50\) using our dplyr data wrangling tools from Chapter 5:

      +
      virtual_shovel %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      + + + + + + + + + + + + + + + + +
      Table 8.2: Count and proportion red in single virtual sample of size n = 50
      replicateredprop_red
      1230.46
      +

      Why does this work? Because for every row where color == "red", the Boolean TRUE is returned and R treats TRUE like the number 1. Equivalently, for every row where color is not equal to "red", the Boolean FALSE is returned and R treats FALSE like the number 0. So summing the number of TRUE’s and FALSE’s is equivalent to summing 1’s and 0’s which counts the number of balls where color is red.

      +
      +
      +

      8.3.2 Using shovel 33 times

      +

      Recall however in our tactile sampling exercise in Section 8.2 above that we had 33 groups of students take 33 samples total of size \(n=50\) using the shovel 33 times and hence compute 33 separate values of the sample proportion red \(\widehat{p}\). In other words we repeated/replicated the sampling 33 times. We can achieve this by reusing the same rep_sample_n() function code above, but by adding the reps = 33 argument indicating we want to repeat this sampling 33 times:

      +
      virtual_samples <- bowl %>% 
      +  rep_sample_n(size = 50, reps = 33)
      +View(virtual_samples)
      +

      virtual_samples has \(50 \times 33 = 1650\) rows, corresponding to 33 samples of size \(n=50\), or 33 draws from the shovel. We won’t display the contents of this data frame but leave it to you to View() this data frame. You’ll see that the first 50 rows have replicate equal to 1, then the next 50 rows have replicate equal to 2, and so on and so forth, up until the last 50 rows which have replicate equal to 33. The replicate variable denotes which of our 33 samples a particular ball is included in.

      +

      Now let’s compute the 33 corresponding values of the sample proportion \(\widehat{p}\) based on 33 different samples of size \(n=50\) by reusing the previous code, but remembering to group_by the replicate variable first since we want to compute the sample proportion for each of the 33 samples separately. Notice the similarity of this table with Table ??.

      +
      virtual_prop_red <- virtual_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      +View(virtual_prop_red)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      replicateredprop_red
      1170.34
      2200.40
      3240.48
      4200.40
      5170.34
      6160.32
      7170.34
      8190.38
      9190.38
      10120.24
      11220.44
      12170.34
      13200.40
      14220.44
      15130.26
      16150.30
      17230.46
      18200.40
      19160.32
      20120.24
      21140.28
      22210.42
      23140.28
      24180.36
      25190.38
      26120.24
      27220.44
      28230.46
      29190.38
      30180.36
      31200.40
      32170.34
      33200.40
      +

      Just as we did before, let’s now visualize the sampling distribution using a histogram with binwidth = 0.05 of the 33 virtually sample proportions \(\widehat{p}\):

      +
      ggplot(virtual_prop_red, aes(x = prop_red)) +
      +  geom_histogram(binwidth = 0.05, color = "white") +
      +  labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat") 
      +
      +Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50 +

      +Figure 8.8: Sampling distribution of 33 sample proportions based on 33 virtual samples with n=50 +

      +
      +

      The resulting sampling distribution based on our virtual sampling simulation is near identical to the sampling distribution of our tactile sampling simulation from Section 8.3. Let’s compare them side-by-side in Figure 8.9.

      +
      +Comparison of sampling distributions based on 33 tactile & virtual samples with n=50 +

      +Figure 8.9: Comparison of sampling distributions based on 33 tactile & virtual samples with n=50 +

      +
      +

      We see that they are similar in terms of center and spread, although not identical due to random variation. This was in fact by design, as we made the virtual contents of the virtual bowl match the actual contents of the actual bowl pictured above.

      +
      +
      +

      8.3.3 Using shovel 1000 times

      +

      In Figure 8.8, we can start seeing a pattern in the sampling distribution emerge. However, 33 values of the sample proportion \(\widehat{p}\) might not be enough to get a true sense of the distribution. Using 1000 values of \(\widehat{p}\) would definitely give a better sense. What are our two options for constructing these histograms?

      +
        +
      1. Tactile sampling: Make the 33 groups of students take \(1000 / 33 \approx 31\) samples of size \(n=50\) each, count the number of red balls for each of the 1000 tactile samples, and then compute the 1000 corresponding values of the sample proportion \(\widehat{p}\). However, this would be cruel and unusual as this would take hours!
      2. +
      3. Virtual sampling: Computers are very good at automating repetitive tasks such as this one. This is the way to go!
      4. +
      +

      First, generate 1000 samples of size \(n=50\)

      +
      virtual_samples <- bowl %>% 
      +  rep_sample_n(size = 50, reps = 1000)
      +View(virtual_samples)
      +

      Then for each of these 1000 samples of size \(n=50\), compute the corresponding sample proportions

      +
      virtual_prop_red <- virtual_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      +View(virtual_prop_red)
      +

      As previously done, let’s plot the sampling distribution of these 1000 simulated values of the sample proportion red \(\widehat{p}\) with a histogram in Figure 8.10.

      +
      ggplot(virtual_prop_red, aes(x = prop_red)) +
      +  geom_histogram(binwidth = 0.05, color = "white") +
      +  labs(x = "Sample proportion red based on n = 50", title = "Sampling distribution of p-hat") 
      +
      +Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 +

      +Figure 8.10: Sampling distribution of 1000 sample proportions based on 1000 tactile samples with n=50 +

      +
      +

      Since the sampling is random and thus representative and unbiased, the above sampling distribution is centered at the true population proportion red \(p\) of all \(N=2400\) balls in the bowl. Eyeballing it, the sampling distribution appears to be centered at around 0.375.

      +

      What is the standard error of the above sampling distribution of \(\widehat{p}\) based on 1000 samples of size \(n=50\)?

      +
      virtual_prop_red %>% 
      +  summarize(SE = sd(prop_red))
      +
      # A tibble: 1 x 1
      +      SE
      +   <dbl>
      +1 0.0698
      +

      What this value is saying might not be immediately apparent by itself to someone who is new to sampling. It’s best to first compare different standard errors for different sampling schemes based on different sample sizes \(n\). We’ll do so for samples of size \(n=25\), \(n=50\), and \(n=100\) next.

      +
      +
      +

      8.3.4 Using different shovels

      +

      Recall, the sampling we just did on the computer using the rep_sample_n() function is simply a virtual version of act of taking a tactile sample using the shovel with \(n=50\) slots seen in Figure 8.11. We visualized the variation in the resulting sample proportion red \(\widehat{p}\) in a histogram of the sampling distribution and quantified this variation using the standard error.

      +
      +Tactile shovel for sampling n = 50 balls +

      +Figure 8.11: Tactile shovel for sampling n = 50 balls +

      +
      +

      But what if we changed the sample size to \(n=25\)? This would correspond to sampling using the shovel with \(n=25\) slots see in Figure 8.12. What differences if any would you notice about the sampling distribution and the standard error?

      +
      +Tactile shovel for sampling n = 25 balls +

      +Figure 8.12: Tactile shovel for sampling n = 25 balls +

      +
      +

      Furthermore what if we took samples of size \(n=100\) as well? This would correspond to sampling using the shovel with \(n=100\) slots see in Figure 8.13. What differences if any would you notice about the sampling distribution and the standard error for \(n=100\) as compared to \(n=50\) and \(n=25\)?

      +
      +Tactile shovel for sampling n = 100 balls +

      +Figure 8.13: Tactile shovel for sampling n = 100 balls +

      +
      +

      Let’s take the opportunity to review our sampling procedure and do this for 1000 virtual samples of size \(n=25\), \(n=50\), \(n=100\) each.

      +

      Shovel with \(n=50\) slots: Take 1000 virtual samples of size \(n=50\), mimicking the act of taking 1000 tactile samples using the shovel with \(n=50\) slots:

      +
      virtual_samples_50 <- bowl %>% 
      +  rep_sample_n(size = 50, reps = 1000)
      +

      Then based on each of these 1000 virtual samples of size \(n=50\), compute the corresponding 1000 sample proportions \(\widehat{p}\) being sure to divide by 50:

      +
      virtual_prop_red_50 <- virtual_samples_50 %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      +

      The standard error is the standard deviation of the 1000 sample proportions \(\widehat{p}\), in other words we are quantifying how much \(\widehat{p}\) varies from sample-to-sample based on samples of size \(n=50\) due to sampling variation.

      +
      virtual_prop_red_50 %>% 
      +  summarize(SE = sd(prop_red))
      +
      # A tibble: 1 x 1
      +      SE
      +   <dbl>
      +1 0.0694
      +

      Shovel with \(n=25\) slots: Take 1000 virtual samples of size \(n=25\), mimicking the act of taking 1000 tactile samples using the shovel with \(n=25\) slots:

      +
      virtual_samples_25 <- bowl %>% 
      +  rep_sample_n(size = 25, reps = 1000)
      +

      Then based on each of these 1000 virtual samples of size \(n=50\), compute the corresponding 1000 sample proportions \(\widehat{p}\) being sure to divide by 50:

      +
      virtual_prop_red_25 <- virtual_samples_25 %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 25)
      +

      The standard error is the standard deviation of the 1000 sample proportions \(\widehat{p}\), in other words we are quantifying how much \(\widehat{p}\) varies from sample-to-sample based on samples of size \(n=25\) due to sampling variation.

      +
      virtual_prop_red_25 %>% 
      +  summarize(SE = sd(prop_red))
      +
      # A tibble: 1 x 1
      +     SE
      +  <dbl>
      +1 0.100
      +

      Shovel with \(n=100\) slots: Take 1000 virtual samples of size \(n=100\), mimicking the act of taking 1000 tactile samples using the shovel with \(n=100\) slots:

      +
      virtual_samples_100 <- bowl %>% 
      +  rep_sample_n(size = 100, reps = 1000)
      +

      Then based on each of these 1000 virtual samples of size \(n=100\), compute the corresponding 1000 sample proportions \(\widehat{p}\) being sure to divide by 100:

      +
      virtual_prop_red_100 <- virtual_samples_100 %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 100)
      +

      The standard error is the standard deviation of the 1000 sample proportions \(\widehat{p}\), in other words we are quantifying how much \(\widehat{p}\) varies from sample-to-sample based on samples of size \(n=100\) due to sampling variation.

      +
      virtual_prop_red_100 %>% 
      +  summarize(SE = sd(prop_red))
      +
      # A tibble: 1 x 1
      +      SE
      +   <dbl>
      +1 0.0457
      +

      Comparison: Let’s compare the 3 standard errors we computed above in Table ??:

      + + + + + + + + + + + + + + + + + + + + + +
      nSE
      250.100
      500.069
      1000.046
      +

      Observe the behavior of the standard error as \(n\) increases from \(n=25\) to \(n=50\) to \(n=100\), the standard error get smaller. In other words, the values of \(\widehat{p}\) vary less. The standard error is a numerical quantification of the spreads of the following three histograms (on the same scale) of the sampling distribution of the sample proportion \(\widehat{p}\):

      +
      +Comparing sampling distributions of p-hat for different sample sizes n +

      +Figure 8.14: Comparing sampling distributions of p-hat for different sample sizes n +

      +
      +

      Observe that the histogram of possible \(\widehat{p}\) values are narrowest and most consistent for the \(n=100\) case. In other words, they make less error. “Bigger sample size equals better sampling” is a concept you probably knew before reading this chapter. What we’ve just demonstrated is what this concept means: Samples based on large samples sizes will yield point estimates that vary less around the true value and hence be less prone to error.

      +

      In the case of our sampling bowl, the sample proportion red \(\widehat{p}\) based on samples of size \(n=100\) will vary the least around the true proportion \(p\) of the balls that are red, and thus be less prone to error. On the case of polls as we study in the next chapter: representative polls based on a larger number of respondents will yield guess that tend to be closer to the truth.

      +
      +
      +
      +

      8.4 In real-life sampling: Polls

      +

      In December 4, 2013 National Public Radio reported on a recent poll of President Obama’s approval rating among young Americans aged 18-29 in an article Poll: Support For Obama Among Young Americans Eroding. A quote from the article:

      +
      +

      After voting for him in large numbers in 2008 and 2012, young Americans are souring on President Obama.

      +

      According to a new Harvard University Institute of Politics poll, just 41 percent of millennials — adults ages 18-29 — approve of Obama’s job performance, his lowest-ever standing among the group and an 11-point drop from April.

      +
      +

      Let’s tie elements of this story using the concepts and terminology we learned at the outset of this chapter along with our observations from the tactile and virtual sampling simulations:

      +
        +
      1. Population: Who is the population of \(N\) observations of interest? +
          +
        • Bowl: \(N=2400\) identically-shaped balls
        • +
        • Obama poll: \(N = \text{?}\) young Americans aged 18-29
        • +
      2. +
      3. Population parameter: What is the population parameter? +
          +
        • Bowl: The true population proportion \(p\) of the balls in the bowl that are red.
        • +
        • Obama poll: The true population proportion \(p\) of young Americans who approve of Obama’s job performance.
        • +
      4. +
      5. Census: What would a census be in this case? +
          +
        • Bowl: Manually going over all \(N=2400\) balls and exactly computing the population proportion \(p\) of the balls that are red.
        • +
        • Obama poll: Locating all \(N = \text{?}\) young Americans (which is in the millions) and asking them if they approve of Obama’s job performance. This would be quite expensive to do!
        • +
      6. +
      7. Sampling: How do you acquire the sample of size \(n\) observations? +
          +
        • Bowl: Using the shovel to extract a sample of \(n=50\) balls.
        • +
        • Obama poll: One way would be to get phone records from a database and pick out \(n\) phone numbers. In the case of the above poll, the sample was of size \(n=2089\) young adults.
        • +
      8. +
      9. Point estimates/sample statistics: What is the summary statistic based on the sample of size \(n\) that estimates the unknown population parameter? +
          +
        • Bowl: The sample proportion \(\widehat{p}\) red of the balls in the sample of size \(n=50\).
        • +
        • Key: The sample proportion red \(\widehat{p}\) of young Americans in the sample of size \(n=2089\) that approve of Obama’s job performance. In this study’s case, \(\widehat{p} = 0.41\) which is the quoted 41% figure in the article.
        • +
      10. +
      11. Representative sampling: Is the sample procedure representative? In other words, to the resulting samples “look like” the population? +
          +
        • Bowl: Does our sample of \(n=50\) balls “look like” the contents of the larger set of \(N=2400\) balls in the bowl?
        • +
        • Obama poll: Does our sample of \(n=2089\) young Americans “look like” the population of all young Americans aged 18-29?
        • +
      12. +
      13. Generalizability: Are the samples generalizable to the greater population? +
          +
        • Bowl: Is \(\widehat{p}\) a “good guess” of \(p\)?
        • +
        • Obama poll: Is \(\widehat{p} = 0.41\) a “good guess” of \(p\)? In other words, can we confidently say that 41% of all young Americans approve of Obama.
        • +
      14. +
      15. Bias: Is the sampling procedure unbiased? In other words, do all observations have an equal chance of being included in the sample? +
          +
        • Bowl: Here, I would say it is unbiased. All balls are equally sized as evidenced by the slots of the \(n=50\) shovel, and thus no particular color of ball can be favored in our samples over others.
        • +
        • Obama poll: Did all young Americans have an equal chance at being represented in this poll? For example, if this was conducted using a database of only mobile phone numbers, would people without mobile phones be included? What about if this were an internet poll on a certain news website? Would non-readers of this this website be included?
        • +
      16. +
      17. Random sampling: Was the sampling random? +
          +
        • Bowl: As long as you mixed the bowl sufficiently before sampling, your samples would be random?
        • +
        • Obama poll: Random sampling is a necessary assumption for all of the above to work. Most articles reporting on polls take this assumption as granted. In our Obama poll, you’d have to ask the group that conducted the poll: The Harvard University Institute of Politics.
        • +
      18. +
      +

      Recall the punchline of all the above:

      +
      +
        +
      • If the sampling of a sample of size \(n\) is done at random, then
      • +
      • The sample is unbiased and representative of the population, thus
      • +
      • Any result based on the sample can generalize to the population, thus
      • +
      • The point estimate/sample statistic is a “good guess” of the unknown population parameter of interest
      • +
      +
      +

      and thus we have inferred about the population based on our sample. In the bowl example:

      +
      +
        +
      • If we properly mix the balls by say stirring the bowl first, then use the shovel to extract a sample of size \(n=50\), then
      • +
      • The contents of the shovel will “look like” the contents of the bowl, thus
      • +
      • Any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, thus
      • +
      • The sample proportion \(\widehat{p}\) of the \(n=50\) sampled balls in the shovel that are red is a “good guess” of the true population proportion \(p\) of the \(N=2400\) balls that are red.
      • +
      +
      +

      and thus we have inferred some new piece of information about the bowl based on our sample extracted by shovel: the proportion of balls that are red. In the Obama poll example:

      +
      +
        +
      • If we had a way of contacting a randomly chosen sample of 2089 young Americans and poll their approval of Obama, then
      • +
      • These 2089 young Americans would “look like” the population of all young Americans, thus
      • +
      • Any results based on this sample of 2089 young Americans can generalize to entire population of all young Americans, thus
      • +
      • The reported sample approval rating of 41% of these 2089 young Americans is a “good guess” of the true approval rating amongst all young Americans.
      • +
      +
      +

      So long story short, this poll’s guess of Obama’s approval rating was 41%. However is this the end of the story when understanding the results of a poll? If you read further in the article, it states:

      +
      +

      The online survey of 2,089 adults was conducted from Oct. 30 to Nov. 11, just weeks after the federal government shutdown ended and the problems surrounding the implementation of the Affordable Care Act began to take center stage. The poll’s margin of error was plus or minus 2.1 percentage points.

      +
      +

      Note the term margin of error, which here is plus or minus 2.1 percentage points. This is saying that a typical range of errors for polls of this type is about \(\pm 2.1\%\), in words from about 2.1% too small to about 2.1% too big. These errors are caused by sampling variation, the same sampling variation you saw studied in the histograms in Sections 8.2 on our tactile sampling simulations and Sections 8.3 on our virtual sampling simulations.

      +

      In this case of polls, any variation from the true approval rating is an “error” and a reasonable range of errors is the margin of error. We’ll see in the next chapter that this what’s known as a 95% confidence interval for the unknown approval rating. We’ll study confidence intervals using a new package for our data science and statistical toolbox: the infer package for statistical inference.

      +
      +
      +

      8.5 Conclusion

      +
      +

      8.5.1 Central Limit Theorem

      +

      What you did in Section 8.2 and 8.3 was demonstrate a very famous theorem, or mathematically proven truth, called the Central Limit Theorem. It loosely states that when sample means and sample proportions are based on larger and larger samples, the sampling distribution corresponding to these point estimates get

      +
        +
      1. More and more normal
      2. +
      3. More and more narrow
      4. +
      +

      Shuyi Chiou, Casey Dunn, and Pathikrit Bhattacharyya created the following three minute and 38 second video explaining this crucial theorem to statistics using as examples, what else?

      +
        +
      1. The average weight of wild bunny rabbits!
      2. +
      3. The average wing span of dragons!
      4. +
      +
      + +
      +
      +
      +

      8.5.2 What’s to come?

      +

      This chapter serves as an introduction to the theoretical underpinning of the statistical inference techniques that will be discussed in greater detail in Chapter 9 for confidence intervals and Chapter 10 for hypothesis testing.

      +
      +
      +

      8.5.3 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/9-confidence-intervals.html b/docs/previous_versions/v0.4.0/9-confidence-intervals.html new file mode 100644 index 000000000..00a84b19c --- /dev/null +++ b/docs/previous_versions/v0.4.0/9-confidence-intervals.html @@ -0,0 +1,1806 @@ + + + + + + + + 9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      9 Confidence Intervals

      +

      In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8:

      +

      Generally speaking, we learned that if the sampling of a sample of size \(n\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest

      +

      Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \(n=50\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, and hence the sample proportion red \(\widehat{p}\) of the \(n=50\) balls in the shovel is a “good guess” of the true population proportion red \(p\) of the \(N=2400\) balls in the bowl.

      +

      We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \(\widehat{p}\), to estimate the unknown value of the population parameter, in this case the population proportion \(p\). In other words, we are using the sample to infer about the population.

      +

      We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table ??. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \(n\).

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      ScenarioPopulation parameterPopulation NotationPoint estimate/sample statisticSample Notation
      1Population proportion\(p\)Sample proportion\(\widehat{p}\)
      2Population mean\(\mu\)Sample mean\(\overline{x}\)
      3Difference in population proportions\(p_1 - p_2\)Difference in sample proportions\(\widehat{p}_1 - \widehat{p}_2\)
      4Difference in population means\(\mu_1 - \mu_2\)Difference in sample means\(\overline{x}_1 - \overline{x}_2\)
      5Population standard deviation\(\sigma\)Sample standard deviation\(s\)
      6Population regression intercept\(\beta_0\)Sample regression intercept\(\widehat{\beta}_0\) or \(b_0\)
      7Population regression slope\(\beta_1\)Sample regression slope\(\widehat{\beta}_1\) or \(b_1\)
      +

      We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:

      +
        +
      • Scenario 2 about means. Ex: the average age of pennies.
      • +
      • Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference.
      • +
      • Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of two-sample inference.
      • +
      +

      In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference.

      +

      In Chapter 11 on inference for regression, we’ll cover Scenarios 6 & 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \(\widehat{y} = b_0 + b_1 \cdot x\), is in fact an estimate of some true population regression line \(y = \beta_0 + \beta+1 \cdot x\) based on a sample of \(n\) pairs of points \((x, y)\). Ex: Recall our sample of \(n=463\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin?

      +

      In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter.

      +
      +

      Needed packages

      +

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages.

      +
      library(dplyr)
      +library(ggplot2)
      +library(janitor)
      +library(moderndive)
      +library(infer)
      +
      +
      +

      DataCamp

      +

      Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.

      +
      +Drawing +Drawing +
      +
      +
      +

      9.1 Bootstrapping

      +
      +

      9.1.1 Data explanation

      +

      The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first:

      +
      pennies_sample
      +
      # A tibble: 40 x 2
      +    year age_in_2011
      +   <int>       <int>
      + 1  2005           6
      + 2  1981          30
      + 3  1977          34
      + 4  1992          19
      + 5  2005           6
      + 6  2006           5
      + 7  2000          11
      + 8  1992          19
      + 9  1988          23
      +10  1996          15
      +# … with 30 more rows
      +

      The pennies_sample data frame has rows corresponding to a single penny with two variables:

      +
        +
      • year of minting as shown on the penny and
      • +
      • age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc.
      • +
      +

      Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 5 and data visualization from Chapter 3.

      +
      +
      +

      9.1.2 Exploratory data analysis

      +

      First, let’s visualize the values in this sample as a histogram:

      +
      ggplot(pennies_sample, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +

      We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.

      +

      So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package:

      +
      x_bar <- pennies_sample %>% 
      +  summarize(stat = mean(age_in_2011))
      +x_bar
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      We’ve denoted this sample mean as \(\bar{x}\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \(\bar{x} = 25.1\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess?

      +

      This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample.

      +

      The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution.

      +
      +
      +

      9.1.3 The Bootstrapping Process

      +

      Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample).

      +
      bootstrap_sample1 <- pennies_sample %>% 
      +  rep_sample_n(size = 40, replace = TRUE, reps = 1)
      +bootstrap_sample1
      +
      # A tibble: 40 x 3
      +# Groups:   replicate [1]
      +   replicate  year age_in_2011
      +       <int> <int>       <int>
      + 1         1  1983          28
      + 2         1  2000          11
      + 3         1  2004           7
      + 4         1  1981          30
      + 5         1  1993          18
      + 6         1  2006           5
      + 7         1  1981          30
      + 8         1  2004           7
      + 9         1  1992          19
      +10         1  1994          17
      +# … with 30 more rows
      +

      Let’s visualize what this new bootstrap sample looks like:

      +
      ggplot(bootstrap_sample1, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +

      We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic.

      +
      bootstrap_sample1 %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 1 x 2
      +  replicate  stat
      +      <int> <dbl>
      +1         1  23.2
      +

      We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly.

      +

      Let’s recap what was done to get to this bootstrap sample using a tactile explanation:

      +
        +
      1. First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc.
      2. +
      3. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap.
      4. +
      5. Shake up the pieces of paper.
      6. +
      7. Draw “at random” from the cap to select one piece of paper.
      8. +
      9. Write down the value on this piece of paper. Say that it is 28.
      10. +
      11. Now, place this piece of paper containing 28 back into the cap.
      12. +
      13. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again.
      14. +
      15. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample.
      16. +
      +

      If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set.

      +

      The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability.

      +
      six_bootstrap_samples <- pennies_sample %>% 
      +  rep_sample_n(size = 40, replace = TRUE, reps = 6)
      +
      ggplot(six_bootstrap_samples, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white") +
      +  facet_wrap(~ replicate)
      +

      +

      We can also look at the six different means using dplyr syntax:

      +
      six_bootstrap_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 6 x 2
      +  replicate  stat
      +      <int> <dbl>
      +1         1  23.6
      +2         2  24.1
      +3         3  25.2
      +4         4  23.1
      +5         5  24.0
      +6         6  24.7
      +

      Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2019) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework.

      +
      +
      +
      +

      9.2 The infer package for statistical inference

      +

      The infer package makes great use of the %>% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed.

      +
      +

      9.2.1 Specify variables

      +

      +

      The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B.

      +

      To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways:

      +
        +
      1. Using the response argument:
      2. +
      +
      pennies_sample %>% 
      +  specify(response = age_in_2011)
      +
      Response: age_in_2011 (integer)
      +# A tibble: 40 x 1
      +   age_in_2011
      +         <int>
      + 1           6
      + 2          30
      + 3          34
      + 4          19
      + 5           6
      + 6           5
      + 7          11
      + 8          19
      + 9          23
      +10          15
      +# … with 30 more rows
      +
        +
      1. Using formula notation:
      2. +
      +
      pennies_sample %>% 
      +  specify(formula = age_in_2011 ~ NULL)
      +
      Response: age_in_2011 (integer)
      +# A tibble: 40 x 1
      +   age_in_2011
      +         <int>
      + 1           6
      + 2          30
      + 3          34
      + 4          19
      + 5           6
      + 6           5
      + 7          11
      + 8          19
      + 9          23
      +10          15
      +# … with 30 more rows
      +

      Note that the formula notation uses the common R methodology to include the response \(y\) variable on the left of the ~ and the explanatory \(x\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters.

      +
      +
      +

      9.2.2 Generate replicates

      +

      +

      After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = "bootstrap". This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well.

      +

      Let’s generate() 1000 bootstrap samples:

      +
      thousand_bootstrap_samples <- pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  generate(reps = 1000)
      +

      We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like:

      +
      thousand_bootstrap_samples %>% count(replicate)
      +
      # A tibble: 1,000 x 2
      +# Groups:   replicate [1,000]
      +   replicate     n
      +       <int> <int>
      + 1         1    40
      + 2         2    40
      + 3         3    40
      + 4         4    40
      + 5         5    40
      + 6         6    40
      + 7         7    40
      + 8         8    40
      + 9         9    40
      +10        10    40
      +# … with 990 more rows
      +

      Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample.

      +
      +
      +

      9.2.3 Calculate summary statistics

      +

      +

      After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here.

      +

      As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to "mean" below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters.

      +
      bootstrap_distribution <- pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "mean")
      +bootstrap_distribution
      +
      # A tibble: 1,000 x 2
      +   replicate  stat
      +       <int> <dbl>
      + 1         1  26.5
      + 2         2  25.4
      + 3         3  26.0
      + 4         4  26  
      + 5         5  25.2
      + 6         6  29.0
      + 7         7  22.8
      + 8         8  26.4
      + 9         9  24.9
      +10        10  28.1
      +# … with 990 more rows
      +

      We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample.

      +
      +

      Observed statistic / point estimate calculations

      +

      Just as group_by() %>% summarize() produces a useful workflow in dplyr, we can also use specify() %>% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr:

      +
      pennies_sample %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate():

      +
      pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  calculate(stat = "mean")
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10.

      +
      +
      +
      +

      9.2.4 Visualize the results

      +

      +

      The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.

      +
      bootstrap_distribution %>% visualize()
      +

      +

      The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.

      +

      The following diagram recaps the infer pipeline for creating a bootstrap distribution.

      +

      +
      +
      +
      +

      9.3 Now to confidence intervals

      +

      Definition: Confidence Interval

      +

      A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%.

      +

      Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter).

      +

      The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar.

      +
      bootstrap_distribution %>% visualize(obs_stat = x_bar)
      +

      +

      We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar:

      +
      bootstrap_distribution %>% 
      +  summarize(mean_of_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_means
      +          <dbl>
      +1          25.1
      +

      In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways.

      +
      +

      9.3.1 The percentile method

      +

      One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.)

      +
      bootstrap_distribution %>% 
      +  get_ci(level = 0.95, type = "percentile")
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   21.0    29.3
      +

      These options are the default values for level and type so we can also just do:

      +
      percentile_ci <- bootstrap_distribution %>% 
      +  get_ci()
      +percentile_ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   21.0    29.3
      +

      Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to "between" (between the values) and endpoints to be those stored with name percentile_ci.

      +
      bootstrap_distribution %>% 
      +  visualize(endpoints = percentile_ci, direction = "between")
      +

      +

      You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.

      +
      +
      +

      9.3.2 The standard error method

      +

      If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \(\bar{x} \pm (multiplier * SE),\) where \(\bar{x}\) is our original sample mean and \(SE\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \(multiplier\) here is the appropriate percentile of the standard normal distribution.

      +

      These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \(multiplier = 1.96\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed.

      +

      Definition: standard error

      +

      The standard error is the standard deviation of the sampling distribution.

      +

      The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met.

      +

      This \(\bar{x} \pm (multiplier * SE)\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly.

      +

      Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval.

      +
      standard_error_ci <- bootstrap_distribution %>% 
      +  get_ci(type = "se", point_estimate = x_bar)
      +standard_error_ci
      +
      # A tibble: 1 x 2
      +  lower upper
      +  <dbl> <dbl>
      +1  21.0  29.3
      +
      bootstrap_distribution %>% 
      +  visualize(endpoints = standard_error_ci, direction = "between")
      +

      +

      We see that both methods produce nearly identical confidence intervals with the percentile method being \([20.97, 29.25]\) and the standard error method being \([20.97, 29.28]\).

      +
      +
      +
      +

      9.4 Comparing bootstrap and sampling distributions

      +

      To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame.

      +
      ggplot(pennies, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +
      pennies %>% 
      +  summarize(mean_age = mean(age_in_2011),
      +            median_age = median(age_in_2011))
      +
      # A tibble: 1 x 2
      +  mean_age median_age
      +     <dbl>      <dbl>
      +1     21.2         20
      +

      We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values.

      +
      ggplot(pennies_sample, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +
      pennies_sample %>% 
      +  summarize(mean_age = mean(age_in_2011),
      +            median_age = median(age_in_2011))
      +
      # A tibble: 1 x 2
      +  mean_age median_age
      +     <dbl>      <dbl>
      +1     25.1       25.5
      +
      +

      Sampling distribution

      +

      Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \(\mu\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame.

      +
      thousand_samples <- pennies %>% 
      +  rep_sample_n(size = 40, reps = 1000, replace = FALSE)
      +

      When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability.

      +

      After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples:

      +
      sampling_distribution <- thousand_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(stat = mean(age_in_2011))
      +

      We could use ggplot() with geom_histogram() again, but since we’ve named our column in summarize() to be stat, we can also use the shortcut visualize() function in infer and also specify the number of bins and also fill the bars with a different color such as "salmon". This will be done to help remember that "salmon" corresponds to “sampling distribution”.

      +
      sampling_distribution %>% 
      +  visualize(bins = 10, fill = "salmon")
      +
      +Sampling distribution for n=40 samples of pennies +

      +Figure 9.1: Sampling distribution for n=40 samples of pennies +

      +
      +

      We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se.

      +
      sampling_distribution %>% 
      +  summarize(se = sd(stat))
      +
      # A tibble: 1 x 1
      +     se
      +  <dbl>
      +1  2.01
      +
      +
      +

      Bootstrap distribution

      +

      Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.

      +
      bootstrap_distribution %>% 
      +  visualize(bins = 10, fill = "blue")
      +

      +
      bootstrap_distribution %>% 
      +  summarize(se = sd(stat))
      +
      # A tibble: 1 x 1
      +     se
      +  <dbl>
      +1  2.12
      +

      Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ:

      +
      sampling_distribution %>% 
      +  summarize(mean_of_sampling_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_sampling_means
      +                   <dbl>
      +1                   21.2
      +
      bootstrap_distribution %>% 
      +  summarize(mean_of_bootstrap_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_bootstrap_means
      +                    <dbl>
      +1                    25.1
      +

      Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \(\mu\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution.

      +
      pennies %>% 
      +  summarize(overall_mean = mean(age_in_2011))
      +
      # A tibble: 1 x 1
      +  overall_mean
      +         <dbl>
      +1         21.2
      +

      Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population.

      +

      The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages.

      +

      If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it.

      +
      +
      +
      +

      9.5 Interpreting the confidence interval

      +

      As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \(\mu\), is \([20.97, 29.25]\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out:

      +
      pennies_sample2 <- pennies %>% 
      +  sample_n(size = 40)
      +

      Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column.

      +

      We next create an infer pipeline to generate a percentile-based 95% confidence interval for \(\mu\):

      +
      percentile_ci2 <- pennies_sample2 %>% 
      +  specify(formula = age_in_2011 ~ NULL) %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "mean") %>% 
      +  get_ci()
      +percentile_ci2
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   18.4    25.3
      +

      This new confidence interval also contains the value of \(\mu\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.

      +

      +

      Of the 100 confidence intervals based on samples of size \(n = 40\), 96 of them captured the population mean \(\mu = 21.152\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.

      +

      To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.

      +

      +

      Of the 100 confidence intervals based on samples of size \(n = 40\), 87 of them captured the population mean \(\mu = 21.152\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.

      +
      +

      Back to our pennies example

      +

      After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1.

      +

      Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.

      +
      +
      +
      +

      9.6 EXAMPLE: One proportion

      +

      Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \(p\), the proportion of balls that are red out of the total \(N = 2400\) red and white balls.

      +

      We will use the first sample reported from Ilyas and Yohan in Subsection 8.2.2 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package.

      + +
      tactile_shovel1
      +
      # A tibble: 50 x 1
      +   color
      +   <chr>
      + 1 red  
      + 2 red  
      + 3 white
      + 4 red  
      + 5 white
      + 6 red  
      + 7 red  
      + 8 white
      + 9 red  
      +10 white
      +# … with 40 more rows
      +
      +

      9.6.1 Observed Statistic

      +

      To compute the proportion that are red in this data we can use the specify() %>% calculate() workflow. Note the use of the success argument here to clarify which of the two colors "red" or "white" we are interested in.

      +
      p_hat <- tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  calculate(stat = "prop")
      +p_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  0.42
      +
      +
      +

      9.6.2 Bootstrap distribution

      +

      Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline:

      +
      tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  generate(reps = 10000)
      +

      This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step.

      +
      bootstrap_props <- tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +

      Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.

      +
      bootstrap_props %>% visualize(bins = 25)
      +

      +

      We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.

      +
      standard_error_ci <- bootstrap_props %>% 
      +  get_ci(type = "se", level = 0.95, point_estimate = p_hat)
      +standard_error_ci
      +
      # A tibble: 1 x 2
      +  lower upper
      +  <dbl> <dbl>
      +1 0.284 0.556
      +
      bootstrap_props %>% 
      +  visualize(bins = 25, endpoints = standard_error_ci)
      +

      +

      We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.

      +
      +
      +

      9.6.3 Theory-based confidence intervals

      +

      When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept.

      +
      +

      Procedure for building a theory-based CI for \(p\)

      +

      To construct a theory-based confidence interval for \(p\), the unknown true population proportion we

      +
        +
      1. Collect a sample of size \(n\)
      2. +
      3. Compute \(\widehat{p}\)
      4. +
      5. Compute the standard error \[\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
      6. +
      7. Compute the margin of error \[\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
      8. +
      9. Compute both end points of the confidence interval: +
          +
        • The lower end point lower_ci: \[\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
        • +
        • The upper end point upper_ci: \[\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} + 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
        • +
      10. +
      11. Alternatively, you can succinctly summarize a 95% confidence interval for \(p\) using the \(\pm\) symbol:
      12. +
      +

      \[ +\widehat{p} \pm \text{MoE} = \widehat{p} \pm 1.96 \cdot \text{SE} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}} +\]

      +
      +
      +

      Confidence intervals based on 33 tactile samples

      +

      Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package.

      + +
      tactile_prop_red
      +

      Let’s now apply the above procedure for constructing confidence intervals for \(p\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 5:

      +
        +
      1. Rename prop_red to p_hat, the official name of the sample proportion
      2. +
      3. Make explicit the sample size n of \(n=50\)
      4. +
      5. the standard error SE
      6. +
      7. the margin of error MoE
      8. +
      9. the left endpoint of the confidence interval lower_ci
      10. +
      11. the right endpoint of the confidence interval upper_ci
      12. +
      +
      conf_ints <- tactile_prop_red %>% 
      +  rename(p_hat = prop_red) %>% 
      +  mutate(
      +    n = 50,
      +    SE = sqrt(p_hat * (1 - p_hat) / n),
      +    MoE = 1.96 * SE,
      +    lower_ci = p_hat - MoE,
      +    upper_ci = p_hat + MoE
      +  )
      +conf_ints
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      groupred_ballsp_hatnSEMoElower_ciupper_ci
      Ilyas, Yohan210.42500.0700.1370.2830.557
      Morgan, Terrance170.34500.0670.1310.2090.471
      Martin, Thomas210.42500.0700.1370.2830.557
      Clark, Frank210.42500.0700.1370.2830.557
      Riddhi, Karina180.36500.0680.1330.2270.493
      Andrew, Tyler190.38500.0690.1350.2450.515
      Julia190.38500.0690.1350.2450.515
      Rachel, Lauren110.22500.0590.1150.1050.335
      Daniel, Caroline150.30500.0650.1270.1730.427
      Josh, Maeve170.34500.0670.1310.2090.471
      Emily, Emily160.32500.0660.1290.1910.449
      Conrad, Emily180.36500.0680.1330.2270.493
      Oliver, Erik170.34500.0670.1310.2090.471
      Isabel, Nam210.42500.0700.1370.2830.557
      X, Claire150.30500.0650.1270.1730.427
      Cindy, Kimberly200.40500.0690.1360.2640.536
      Kevin, James110.22500.0590.1150.1050.335
      Nam, Isabelle210.42500.0700.1370.2830.557
      Harry, Yuko150.30500.0650.1270.1730.427
      Yuki, Eileen160.32500.0660.1290.1910.449
      Ramses230.46500.0700.1380.3220.598
      Joshua, Elizabeth, Stanley150.30500.0650.1270.1730.427
      Siobhan, Jane180.36500.0680.1330.2270.493
      Jack, Will160.32500.0660.1290.1910.449
      Caroline, Katie210.42500.0700.1370.2830.557
      Griffin, Y180.36500.0680.1330.2270.493
      Kaitlin, Jordan170.34500.0670.1310.2090.471
      Ella, Garrett180.36500.0680.1330.2270.493
      Julie, Hailin150.30500.0650.1270.1730.427
      Katie, Caroline210.42500.0700.1370.2830.557
      Mallory, Damani, Melissa210.42500.0700.1370.2830.557
      Katie160.32500.0660.1290.1910.449
      Francis, Vignesh190.38500.0690.1350.2450.515
      +

      Let’s plot:

      +
        +
      1. These 33 confidence intervals for \(p\): from lower_ci to upper_ci
      2. +
      3. The true population proportion \(p = 900 / 2400 = 0.375\) with a red vertical line
      4. +
      +
      +33 confidence intervals based on 33 tactile samples of size n=50 +

      +Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 +

      +
      +

      We see that:

      +
        +
      • In 31 cases, the confidence intervals “capture” the true \(p = 900 / 2400 = 0.375\)
      • +
      • In 2 cases, the confidence intervals do not “capture” the true \(p = 900 / 2400 = 0.375\)
      • +
      +

      Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology.

      +
      +
      +

      Confidence intervals based on 100 virtual samples

      +

      Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \(p\) are:

      +
        +
      1. Collect a sample of size \(n = 50\) as we did in Chapter 8
      2. +
      3. Compute \(\widehat{p}\): the sample proportion red of these \(n=50\) balls
      4. +
      5. Compute the standard error \(\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
      6. +
      7. Compute the margin of error \(\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
      8. +
      9. Compute both end points of the confidence interval: +
          +
        • lower_ci: \(\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
        • +
        • upper_ci: \(\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} +1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
        • +
      10. +
      +

      Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on:

      +
      # First: Take 100 virtual samples of n=50 balls
      +virtual_samples <- bowl %>% 
      +  rep_sample_n(size = 50, reps = 100)
      +
      +# Second: For each virtual sample compute the proportion red
      +virtual_prop_red <- virtual_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      +
      +# Third: Compute the 95% confidence interval as above
      +virtual_prop_red <- virtual_prop_red %>% 
      +  rename(p_hat = prop_red) %>% 
      +  mutate(
      +    n = 50,
      +    SE = sqrt(p_hat*(1-p_hat)/n),
      +    MoE = 1.96 * SE,
      +    lower_ci = p_hat - MoE,
      +    upper_ci = p_hat + MoE
      +  )
      +

      Here are the results:

      +
      +100 confidence intervals based on 100 virtual samples of size n=50 +

      +Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 +

      +
      +

      We see that of our 100 confidence intervals based on samples of size \(n=50\), 96 of them captured the true \(p = 900/2400\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.”

      +

      Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10.

      +
      +
      +
      +
      +

      9.7 EXAMPLE: Comparing two proportions

      +

      If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here.

      +

      Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out.

      +
      mythbusters_yawn
      +
      # A tibble: 50 x 3
      +    subj group   yawn 
      +   <int> <chr>   <chr>
      + 1     1 seed    yes  
      + 2     2 control yes  
      + 3     3 seed    no   
      + 4     4 seed    yes  
      + 5     5 seed    no   
      + 6     6 control no   
      + 7     7 seed    yes  
      + 8     8 control no   
      + 9     9 control no   
      +10    10 seed    no   
      +# … with 40 more rows
      +
        +
      • The participant ID is stored in the subj variable with values of 1 to 50.
      • +
      • The group variable is either "seed" for when a confederate was trying to influence the participant or "control" if a confederate did not interact with the participant.
      • +
      • The yawn variable is either "yes" if the participant yawned or "no" if the participant did not yawn.
      • +
      +

      We can use the janitor package to get a glimpse into this data in a table format:

      +
      mythbusters_yawn %>% 
      +  tabyl(group, yawn) %>% 
      +  adorn_percentages() %>% 
      +  adorn_pct_formatting() %>% 
      +  # To show original counts
      +  adorn_ns()
      +
         group         no        yes
      + control 75.0% (12) 25.0%  (4)
      +    seed 70.6% (24) 29.4% (10)
      +

      We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.

      +

      In looking over this problem, we can make note of some important details to include in our infer pipeline:

      +
        +
      • We are calling a success having a yawn value of "yes".
      • +
      • Our response variable will always correspond to the variable used in the success so the response variable is yawn.
      • +
      • The explanatory variable is the other variable of interest here: group.
      • +
      +

      To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.

      +
      +

      9.7.1 Compute the point estimate

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group)
      +
      Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.
      +

      Note that the success argument must be specified in situations such as this where the response variable has only two levels.

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes")
      +
      Response: yawn (factor)
      +Explanatory: group (factor)
      +# A tibble: 50 x 2
      +   yawn  group  
      +   <fct> <fct>  
      + 1 yes   seed   
      + 2 yes   control
      + 3 no    seed   
      + 4 yes   seed   
      + 5 no    seed   
      + 6 no    control
      + 7 yes   seed   
      + 8 no    control
      + 9 no    control
      +10 no    seed   
      +# … with 40 more rows
      +

      We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  calculate(stat = "diff in props")
      +
      Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c("first", "second")` means `("first" - "second")`. Check `?calculate` for details.
      +

      We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put "seed" first after c() and then "control": order = c("seed", "control"). Our point estimate is thus calculated:

      +
      obs_diff <- mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  calculate(stat = "diff in props", order = c("seed", "control"))
      +obs_diff
      +
      # A tibble: 1 x 1
      +    stat
      +   <dbl>
      +1 0.0441
      +

      This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).

      +
      +
      +

      9.7.2 Bootstrap distribution

      +

      Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though.

      +

      In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn:

      +
      head(mythbusters_yawn)
      +
      # A tibble: 6 x 3
      +   subj group   yawn 
      +  <int> <chr>   <chr>
      +1     1 seed    yes  
      +2     2 control yes  
      +3     3 seed    no   
      +4     4 seed    yes  
      +5     5 seed    no   
      +6     6 control no   
      +

      When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of "seed" for group and "no" for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement.

      +
      set.seed(2019)
      +
      head(mythbusters_yawn) %>% 
      +  sample_n(size = 6, replace = TRUE)
      +
      # A tibble: 6 x 3
      +   subj group   yawn 
      +  <int> <chr>   <chr>
      +1     5 seed    no   
      +2     5 seed    no   
      +3     2 control yes  
      +4     4 seed    yes  
      +5     1 seed    yes  
      +6     1 seed    yes  
      +

      We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below.

      +
      bootstrap_distribution <- mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "diff in props", order = c("seed", "control"))
      +
      bootstrap_distribution %>% visualize(bins = 20)
      +

      +

      This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used.

      +
      bootstrap_distribution %>% 
      +  get_ci(type = "percentile", level = 0.95)
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.219   0.293
      +

      The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.

      +

      Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.

      +

      Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified "seed" and then "control" positive values for the statistic correspond to the "seed" proportion being higher, whereas negative values correspond to the "control" group being higher.

      +

      We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.

      +
      +

      +Learning check +

      +
      +

      Practice problems to come soon!

      +
      + +
      +
      +
      +
      +

      9.8 Conclusion

      +
      +

      9.8.1 What’s to come?

      +

      This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next!

      +
      +
      +

      9.8.2 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/A-appendixA.html b/docs/previous_versions/v0.4.0/A-appendixA.html new file mode 100644 index 000000000..38f919b39 --- /dev/null +++ b/docs/previous_versions/v0.4.0/A-appendixA.html @@ -0,0 +1,657 @@ + + + + + + + + A Statistical Background | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      A Statistical Background

      +
      +

      A.1 Basic statistical terms

      +
      +

      A.1.1 Mean

      +

      The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \(n\) data points, the mean is given by: \[Mean = \frac{x_1 + x_2 + \cdots + x_n}{n}\]

      +
      +
      +

      A.1.2 Median

      +

      The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values.

      +
      +
      +

      A.1.3 Standard deviation

      +

      We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean:

      +

      \[Standard \, deviation = \sqrt{\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \cdots + (x_n - Mean)^2}{n - 1}}\]

      +
      +
      +

      A.1.4 Five-number summary

      +

      The five-number summary consists of five values: minimum, first quantile (25th percentile), median (50th percentile), third quantile (75th) quantile, and maximum. The quantiles are calculated as

      +
        +
      • first quantile (\(Q_1\)): the median of the first half of the sorted data
      • +
      • third quantile (\(Q_3\)): the median of the second half of the sorted data
      • +
      +

      The interquartile range is defined as \(Q_3 - Q_1\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets.

      +
      +
      +

      A.1.5 Distribution

      +

      The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization.

      +
      +
      +

      A.1.6 Outliers

      +

      Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \(Q_1 - (1.5 * IQR)\) or above \(Q_3 + (1.5 * IQR)\).

      +

      Note that these terms (aside from Distribution) only apply to quantitative variables.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/B-appendixB.html b/docs/previous_versions/v0.4.0/B-appendixB.html new file mode 100644 index 000000000..ea1e3de54 --- /dev/null +++ b/docs/previous_versions/v0.4.0/B-appendixB.html @@ -0,0 +1,1618 @@ + + + + + + + + B Inference Examples | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      B Inference Examples

      +

      This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented.

      +
      +

      +Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. +

      +

      +Please check out our sneak peak of infer below in the meanwhile. For more details on infer visit https://infer.netlify.com/. +

      +
      +Drawing +
      +
      +
      +

      Needed packages

      +
      library(dplyr)
      +library(ggplot2)
      +library(infer)
      +library(knitr)
      +library(readr)
      +library(janitor)
      +
      +
      +

      B.1 Inference mind map

      +

      To help you better navigate and choose the appropriate analysis, we’ve created a mind map on http://coggle.it available here and below.

      +
      +Mind map for Inference +

      +Figure B.1: Mind map for Inference +

      +
      +
      +
      +

      B.2 One mean

      +
      +

      B.2.1 Problem statement

      +

      The National Survey of Family Growth conducted by the +Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, +infertility, use of contraception, and men’s and women’s health. One of the variables collected on +this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4])

      +
      +
      +

      B.2.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.

      • +
      • Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu = \mu_{0}\), where \(\mu\) represents the mean age of first marriage for all US women from 2006 to 2010 and \(\mu_0\) is 23.
      • +
      • \(H_A: \mu > 23\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.2.3 Exploring the sample data

      +
      age_at_marriage <- read_csv("https://moderndive.com/data/ageAtMar.csv")
      +
      age_summ <- age_at_marriage %>%
      +  summarize(sample_size = n(),
      +    mean = mean(age),
      +    sd = sd(age),
      +    minimum = min(age),
      +    lower_quartile = quantile(age, 0.25),
      +    median = median(age),
      +    upper_quartile = quantile(age, 0.75),
      +    max = max(age))
      +kable(age_summ)
      + + + + + + + + + + + + + + + + + + + + + + + + + +
      sample_sizemeansdminimumlower_quartilemedianupper_quartilemax
      553423.44.721020232643
      +

      The histogram below also shows the distribution of age.

      +
      ggplot(data = age_at_marriage, mapping = aes(x = age)) +
      +  geom_histogram(binwidth = 3, color = "white")
      +

      +

      The observed statistic of interest here is the sample mean:

      +
      x_bar <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  calculate(stat = "mean")
      +x_bar
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  23.4
      +
      +

      Guess about statistical significance

      +

      We are looking to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.

      +
      +
      +
      +
      +

      B.2.4 Non-traditional methods

      +
      +

      Bootstrapping for hypothesis test

      +

      In order to look to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.

      +

      We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:

      +
        +
      1. Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,
      2. +
      3. calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,
      4. +
      5. combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and
      6. +
      7. shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.)
      8. +
      +
      set.seed(2018)
      +null_distn_one_mean <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  hypothesize(null = "point", mu = 23) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      null_distn_one_mean %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \(p\)-value.

      +
      null_distn_one_mean %>%
      +  visualize(obs_stat = x_bar, direction = "greater")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_one_mean %>%
      +  get_pvalue(obs_stat = x_bar, direction = "greater")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1       0
      +

      So our \(p\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \(\bar{x}_{obs} = 23.44\).

      +
      boot_distn_one_mean <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      ci <- boot_distn_one_mean %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   23.3    23.6
      +
      boot_distn_one_mean %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).

      +

      Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.

      +
      +
      +
      +
      +

      B.2.5 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are collected independently.

        +

        The cases are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30.

        +

        The histogram for the sample above does show some skew.

      4. +
      +

      The Q-Q plot below also shows some skew.

      +
      ggplot(data = age_at_marriage, mapping = aes(sample = age)) +
      +  stat_qq()
      +

      +

      The sample size here is quite large though (\(n = 5534\)) so both conditions are met.

      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \(\mu\). A good guess is the sample mean \(\bar{X}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{obs} = 23.44\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):

      +

      \[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]

      +

      where \(S\) represents the standard deviation of the sample and \(n\) is the sample size.

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test() function to perform this analysis for us.

      +
      t_test_results <- age_at_marriage %>% 
      +  infer::t_test(formula = age ~ NULL,
      +       alternative = "greater",
      +       mu = 23)
      +t_test_results
      +
      # A tibble: 1 x 6
      +  statistic  t_df  p_value alternative lower_ci upper_ci
      +      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
      +1      6.94  5533 2.25e-12 greater         23.3      Inf
      +

      We see here that the \(t_{obs}\) value is 6.936.

      +
      +
      +
      +

      Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing an \(t_{obs}\) value of 6.936 or more in our null distribution of a \(t\) with 5533 degrees of freedom—is essentially 0.

      +
      +
      +

      State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

      +
      +
      +

      Confidence interval

      +
      t.test(x = age_at_marriage$age, 
      +       alternative = "two.sided",
      +       mu = 23)$conf
      +
      [1] 23.3 23.6
      +attr(,"conf.level")
      +[1] 0.95
      +
      +
      +
      +
      +

      B.2.6 Comparing results

      +

      Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.3 One proportion

      +
      +

      B.3.1 Problem statement

      +

      The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP]

      +
      +
      +

      B.3.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80.

      • +
      • Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \pi = p_{0}\), where \(\pi\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \(p_0\) is 0.8.
      • +
      • \(H_A: \pi \ne 0.8\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.3.3 Exploring the sample data

      +
      elec <- c(rep("satisfied", 73), rep("unsatisfied", 27)) %>% 
      +  as_data_frame() %>% 
      +  rename(satisfy = value)
      +

      The bar graph below also shows the distribution of satisfy.

      +
      ggplot(data = elec, aes(x = satisfy)) + 
      +  geom_bar()
      +

      +

      The observed statistic is computed as

      +
      p_hat <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  calculate(stat = "prop")
      +p_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  0.73
      +
      +

      Guess about statistical significance

      +

      We are looking to see if the sample proportion of 0.73 is statistically different from \(p_0 = 0.8\) based on this sample. They seem to be quite close, and our sample size is not huge here (\(n = 100\)). Let’s guess that we do not have evidence to reject the null hypothesis.

      +
      +
      +
      +
      +

      B.3.4 Non-traditional methods

      +
      +

      Simulation for hypothesis test

      +

      In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to simulate this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \(\hat{p}\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes:

      +
      set.seed(2018)
      +null_distn_one_prop <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  hypothesize(null = "point", p = 0.8) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +
      null_distn_one_prop %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \(p\)-value:

      +
      null_distn_one_prop %>% 
      +  visualize(obs_stat = p_hat, direction = "both")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_one_prop %>% 
      +  get_pvalue(obs_stat = p_hat, direction = "both")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1  0.0813
      +

      So our \(p\)-value is 0.081 and we fail to reject the null hypothesis at the 5% level.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\pi\) using our sample data. To do so, we use bootstrapping, which involves

      +
        +
      1. sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times,
      2. +
      3. calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1.,
      4. +
      5. combining all of these bootstrap statistics calculated in Step 2 into a boot_distn object,
      6. +
      7. identifying the 2.5th and 97.5th percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for \(\pi\), and
      8. +
      9. interpret this confidence interval in the context of the problem.
      10. +
      +
      boot_distn_one_prop <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +

      Just as we use the mean function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the ==. (Think about the formula for calculating a mean and how R handles logical statements such as satisfy == "satisfied" for why this must be true.)

      +
      ci <- boot_distn_one_prop %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   0.64    0.81
      +
      boot_distn_one_prop %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0.80 is contained in this confidence interval as a plausible value of \(\pi\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.

      +

      Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.

      +
      +
      +
      +
      +

      B.3.5 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are collected independently.

        +

        The cases are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The number of expected successes and expected failures is at least 10.

        +

        This condition is met since 73 and 27 are both greater than 10.

      4. +
      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \(\pi\). A good guess is the sample proportion \(\hat{P}\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \(\hat{p}_{obs} = 0.73\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can standardize this original test statistic of \(\hat{P}\) into a \(Z\) statistic that follows a \(N(0, 1)\) distribution.

      +

      \[ Z =\dfrac{ \hat{P} - p_0}{\sqrt{\dfrac{p_0(1 - p_0)}{n} }} \sim N(0, 1) \]

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though:

      +
      p_hat <- 0.73
      +p0 <- 0.8
      +n <- 100
      +(z_obs <- (p_hat - p0) / sqrt( (p0 * (1 - p0)) / n))
      +
      [1] -1.75
      +

      We see here that the \(z_{obs}\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8.

      +
      +
      +
      +

      Visualize and compute \(p\)-value

      +
      elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  hypothesize(null = "point", p = 0.8) %>% 
      +  calculate(stat = "z") %>% 
      +  visualize(method = "theoretical", obs_stat = z_obs, direction = "both")
      +

      +
      2 * pnorm(z_obs)
      +
      [1] 0.0801
      +

      The \(p\)-value—the probability of observing an \(z_{obs}\) value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%.

      +

      Note that we could also do this test directly using the prop.test function.

      +
      stats::prop.test(x = table(elec$satisfy),
      +       n = length(elec$satisfy),
      +       alternative = "two.sided",
      +       p = 0.8,
      +       correct = FALSE)
      +
      
      +    1-sample proportions test without continuity correction
      +
      +data:  table(elec$satisfy), null probability 0.8
      +X-squared = 3, df = 1, p-value = 0.08
      +alternative hypothesis: true p is not equal to 0.8
      +95 percent confidence interval:
      + 0.636 0.807
      +sample estimates:
      +   p 
      +0.73 
      +

      prop.test does a \(\chi^2\) test here but this matches up exactly with what we would expect: \(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\) and the \(p\)-values are the same because we are focusing on a two-tailed test.

      +

      Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.

      +
      +
      +

      State conclusion

      +

      We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level.

      +
      +
      +
      +
      +

      B.3.6 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.4 Two proportions

      +
      +

      B.4.1 Problem statement

      +

      A 2010 survey asked 827 randomly sampled registered voters +in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of +California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data +provide strong evidence that the proportion of college +graduates who do not have an opinion on this issue is +different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6])

      +
      +
      +

      B.4.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

      • +
      • Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

      • +
      +
      +
      +

      Another way in words

      +
        +
      • Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate.

      • +
      • Alternative hypothesis: These parameter probabilities are different.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \pi_{college} = \pi_{no\_college}\) or \(H_0: \pi_{college} - \pi_{no\_college} = 0\), where \(\pi\) represents the probability of not having an opinion on drilling.
      • +
      • \(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.4.3 Exploring the sample data

      +
      offshore <- read_csv("https://moderndive.com/data/offshore.csv")
      +
      offshore %>% tabyl(college_grad, response)
      +
       college_grad no opinion opinion
      +           no        131     258
      +          yes        104     334
      +
      off_summ <- offshore %>% 
      +  group_by(college_grad) %>% 
      +  summarize(prop_no_opinion = mean(response == "no opinion"),
      +    sample_size = n())
      +
      ggplot(offshore, aes(x = college_grad, fill = response)) +
      +  geom_bar(position = "fill") +
      +  coord_flip()
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if a difference exists in the size of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant!

      +
      +
      +
      +
      +

      B.4.4 Non-traditional methods

      +
      +

      Collecting summary info

      +

      The observed statistic is

      +
      d_hat <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +d_hat
      +
      # A tibble: 1 x 1
      +     stat
      +    <dbl>
      +1 -0.0993
      +
      +
      +

      Randomization for hypothesis test

      +

      In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \(\hat{p}_{grad} - \hat{p}_{nograd}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.

      +

      We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability.

      +
      set.seed(2018)
      +null_distn_two_props <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>%
      +  hypothesize(null = "independence") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +
      null_distn_two_props %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \(p\)-value.

      +
      null_distn_two_props %>% 
      +  visualize(obs_stat = d_hat, direction = "two_sided")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_two_props %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "two_sided")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1  0.0021
      +

      So our \(p\)-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\pi_{college} - \pi_{no\_college}\) using our sample data with bootstrapping.

      +
      boot_distn_two_props <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>%
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +
      ci <- boot_distn_two_props %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.161 -0.0378
      +
      boot_distn_two_props %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.

      +

      Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.

      +
      +
      +
      +
      +

      B.4.5 Traditional methods

      +
      +
      +

      B.4.6 Check conditions

      +

      Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: Each case that was selected must be independent of all the other cases selected.

        +

        This condition is met since cases were selected at random to observe.

      2. +
      3. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group.

        +

        We need to first figure out the pooled success rate: \[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\] We now determine expected (pooled) success and failure counts:

        +

        \(0.28 \cdot (131 + 258) = 108.92\), \(0.72 \cdot (131 + 258) = 280.08\)

        +

        \(0.28 \cdot (104 + 334) = 122.64\), \(0.72 \cdot (104 + 334) = 315.36\)

      4. +
      5. Independent selection of samples: The cases are not paired in any meaningful way.

        +

        We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.

      6. +
      +
      +
      +

      B.4.7 Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\(\hat{P}_{college} - \hat{P}_{no\_college}\)) using the standard error of \(\hat{P}_{college} - \hat{P}_{no\_college}\) and the pooled estimate:

      +

      \[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \] where \(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)

      +
      +

      Observed test statistic

      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us.

      +
      z_hat <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>% 
      +  calculate(stat = "z", order = c("yes", "no"))
      +z_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 -3.16
      +

      The observed difference in sample proportions is 3.16 standard deviations smaller than 0.

      +

      The \(p\)-value—the probability of observing a \(Z\) value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly:

      +
      2 * pnorm(-3.16, lower.tail = TRUE)
      +
      [1] 0.00158
      +
      +
      +
      +

      B.4.8 State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.

      +
      +
      +
      +

      B.4.9 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.

      +
      +
      +
      +
      +
      +

      B.5 Two means (independent samples)

      +
      +

      B.5.1 Problem statement

      +

      Average income varies from one region of the country to +another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate +is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see +whether the average income in one of these cities is higher than the other. He would like to conduct +a hypothesis test based on two randomly selected samples from the 2000 Census. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5])

      +
      +
      +

      B.5.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA).

      • +
      • Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA).

      • +
      +
      +
      +

      Another way in words

      +
        +
      • Null hypothesis: The mean income is the same for both cities.

      • +
      • Alternative hypothesis: The mean income is different for the two cities.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu_{sac} = \mu_{cle}\) or \(H_0: \mu_{sac} - \mu_{cle} = 0\), where \(\mu\) represents the average income.
      • +
      • \(H_A: \mu_{sac} - \mu_{cle} \ne 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.5.3 Exploring the sample data

      +
      cle_sac <- read.delim("https://moderndive.com/data/cleSac.txt") %>%
      +  rename(metro_area = Metropolitan_area_Detailed,
      +         income = Total_personal_income) %>%
      +  na.omit()
      +
      inc_summ <- cle_sac %>% group_by(metro_area) %>%
      +  summarize(sample_size = n(),
      +    mean = mean(income),
      +    sd = sd(income),
      +    minimum = min(income),
      +    lower_quartile = quantile(income, 0.25),
      +    median = median(income),
      +    upper_quartile = quantile(income, 0.75),
      +    max = max(income))
      +kable(inc_summ)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      metro_areasample_sizemeansdminimumlower_quartilemedianupper_quartilemax
      Cleveland_ OH2122746727681084752100035275152400
      Sacramento_ CA1753242835774080502000049350206900
      +

      The boxplot below also shows the mean for each group highlighted by the red dots.

      +
      ggplot(cle_sac, aes(x = metro_area, y = income)) +
      +  geom_boxplot() +
      +  stat_summary(fun.y = "mean", geom = "point", color = "red")
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place.

      +
      +
      +
      +
      +

      B.5.4 Non-traditional methods

      +
      +

      Collecting summary info

      +

      We now compute the observed statistic:

      +
      d_hat <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  calculate(stat = "diff in means", 
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +d_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 4960.
      +
      +
      +

      Randomization for hypothesis test

      +

      In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \(\bar{x}_{sac} - \bar{x}_{cle}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected.

      +

      We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability.

      +
      set.seed(2018)
      +null_distn_two_means <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  hypothesize(null = "independence") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in means",
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +
      null_distn_two_means %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \(p\)-value.

      +
      null_distn_two_means %>% 
      +  visualize(obs_stat = d_hat, direction = "both")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_two_means %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "both")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1   0.124
      +

      So our \(p\)-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu_{sac} - \mu_{cle}\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups +argument in the resample function to fix the size of each group to +be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland.

      +
      boot_distn_two_means <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in means",
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +
      ci <- boot_distn_two_means %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -1446.  11308.
      +
      boot_distn_two_means %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is contained in this confidence interval as a plausible value of \(\mu_{sac} - \mu_{cle}\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.

      +

      Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.

      +

      Note: You could also use the null distribution based on randomization with a shift to have its center at \(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.

      +
      +
      +
      +
      +

      B.5.5 Traditional methods

      +
      +
      Check conditions
      +

      Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are independent in both groups.

        +

        This metro_area variable is met since the cases are randomly selected from each city.

      2. +
      3. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30.

      4. +
      +
      ggplot(cle_sac, aes(x = income)) +
      +  geom_histogram(color = "white", binwidth = 20000) +
      +  facet_wrap(~ metro_area)
      +

      +

      We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply.

      +
        +
      1. Independent samples: The samples should be collected without any natural pairing.

        +

        There is no mention of there being a relationship between those selected in Cleveland and in Sacramento.

      2. +
      +
      +
      +
      +

      B.5.6 Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\(\bar{x}_{sac, obs} - \bar{x}_{cle, obs}\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \(t\) distribution to standardize the difference in sample means (\(\bar{X}_{sac} - \bar{X}_{cle}\)) using the approximate standard error of \(\bar{X}_{sac} - \bar{X}_{cle}\) (invoking \(S_{sac}\) and \(S_{cle}\) as estimates of unknown \(\sigma_{sac}\) and \(\sigma_{cle}\)).

      +

      \[ T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}} } \sim t (df = min(n_1 - 1, n_2 - 1)) \] where 1 = Sacramento and 2 = Cleveland with \(S_1^2\) and \(S_2^2\) the sample variance of the incomes of both cities, respectively, and \(n_1 = 175\) for Sacramento and \(n_2 = 212\) for Cleveland.

      +
      +

      Observed test statistic

      +

      Note that we could also do (ALMOST) this test directly using the t.test function. The x and y arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets.

      +
      cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  calculate(stat = "t",
      +            order = c("Cleveland_ OH", "Sacramento_ CA"))
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 -1.50
      + +

      We see here that the observed test statistic value is around -1.5.

      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies.

      + + +
      +
      +
      +

      B.5.7 Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing an \(t_{174}\) value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly:

      +
      2 * pt(-1.501, df = min(212 - 1, 175 - 1), lower.tail = TRUE)
      +
      [1] 0.135
      +

      We can also approximate by using the standard normal curve:

      +
      2 * pnorm(-1.501)
      +
      [1] 0.133
      +

      Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.

      +
      +
      +

      B.5.8 State conclusion

      +

      We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data.

      +
      +
      +
      +

      B.5.9 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.6 Two means (paired samples)

      +
      +

      Problem statement

      +

      Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from https://onlinecourses.science.psu.edu/stat500/node/51]

      +
      +
      +

      B.6.1 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations.

      • +
      • Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu_{diff} = 0\), where \(\mu_{diff}\) represents the mean difference in concentration for surface water minus bottom water.
      • +
      • \(H_A: \mu_{diff} < 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.6.2 Exploring the sample data

      +
      zinc_tidy <- read_csv("https://moderndive.com/data/zinc_tidy.csv")
      +

      We want to look at the differences in surface - bottom for each location:

      +
      zinc_diff <- zinc_tidy %>% 
      +  group_by(loc_id) %>% 
      +  summarize(pair_diff = diff(concentration)) %>% 
      +  ungroup()
      +

      Next we calculate the mean difference as our observed statistic:

      +
      d_hat <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  calculate(stat = "mean")
      +d_hat
      +
      # A tibble: 1 x 1
      +     stat
      +    <dbl>
      +1 -0.0804
      +

      The histogram below also shows the distribution of pair_diff.

      +
      ggplot(zinc_diff, aes(x = pair_diff)) +
      +  geom_histogram(binwidth = 0.04, color = "white")
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis.

      +
      +
      +
      +
      +

      B.6.3 Non-traditional methods

      +
      +

      Bootstrapping for hypothesis test

      +

      In order to look to see if the observed sample mean difference \(\bar{x}_{diff} = 4960.477\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means.

      +

      Treating the differences as our data of interest, we next use the process of bootstrapping to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero.

      +

      This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0.

      +
      set.seed(2018)
      +null_distn_paired_means <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  hypothesize(null = "point", mu = 0) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      null_distn_paired_means %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \(p\)-value.

      +
      null_distn_paired_means %>% 
      +  visualize(obs_stat = d_hat, direction = "less")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_paired_means %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "less")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1       0
      +

      So our \(p\)-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu_{diff}\) using our sample data (the calculated differences) with bootstrapping. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data. +Note that this code is identical to the pipeline shown in the hypothesis test above except the hypothesize() function is not called.

      +
      boot_distn_paired_means <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      ci <- boot_distn_paired_means %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.112 -0.0503
      +
      boot_distn_paired_means %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is not contained in this confidence interval as a plausible value of \(\mu_{diff}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.

      +

      Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.

      +
      +
      +
      +
      +

      B.6.4 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations among pairs are independent.

        +

        The locations are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The distribution of population of differences is normal or the number of pairs is at least 30.

        +

        The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here.

      4. +
      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \(\mu_{diff}\). A good guess is the sample mean difference \(\bar{X}_{diff}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{diff, obs} = 0.0804\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}_{diff}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):

      +

      \[ T =\dfrac{ \bar{X}_{diff} - 0}{ S_{diff} / \sqrt{n} } \sim t (df = n - 1) \]

      +

      where \(S\) represents the standard deviation of the sample differences and \(n\) is the number of pairs.

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test function on the differences to perform this analysis for us.

      +
      t_test_results <- zinc_diff %>% 
      +  infer::t_test(formula = pair_diff ~ NULL, 
      +         alternative = "less",
      +         mu = 0)
      +t_test_results
      +
      # A tibble: 1 x 6
      +  statistic  t_df  p_value alternative lower_ci upper_ci
      +      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
      +1     -4.86     9 0.000446 less            -Inf  -0.0501
      +

      We see here that the \(t_{obs}\) value is -4.864.

      +
      +
      +
      +

      Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing a \(t_{obs}\) value of -4.864 or less in our null distribution of a \(t\) with 9 degrees of freedom—is 0. This can also be calculated in R directly:

      +
      pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE)
      +
      [1] 0.000446
      +
      +
      +

      State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.

      +
      +
      +
      +
      +

      B.6.5 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/C-appendixC.html b/docs/previous_versions/v0.4.0/C-appendixC.html new file mode 100644 index 000000000..c8c82b6fe --- /dev/null +++ b/docs/previous_versions/v0.4.0/C-appendixC.html @@ -0,0 +1,693 @@ + + + + + + + + C Reach for the Stars | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      C Reach for the Stars

      +
      +

      Needed packages

      +
      library(dplyr)
      +library(ggplot2)
      +library(knitr)
      +library(dygraphs)
      +library(nycflights13)
      +
      +
      +

      C.1 Sorted barplots

      +

      Building upon the example in Section 3.8:

      +
      flights_table <- table(flights$carrier)
      +flights_table
      +
      
      +   9E    AA    AS    B6    DL    EV    F9    FL    HA    MQ    OO    UA    US 
      +18460 32729   714 54635 48110 54173   685  3260   342 26397    32 58665 20536 
      +   VX    WN    YV 
      + 5162 12275   601 
      +

      We can sort this table from highest to lowest counts by using the sort function:

      +
      sorted_flights <- sort(flights_table, decreasing = TRUE)
      +names(sorted_flights)
      +
       [1] "UA" "B6" "EV" "DL" "AA" "MQ" "US" "9E" "WN" "VX" "FL" "AS" "F9" "YV" "HA"
      +[16] "OO"
      +

      It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”.

      +

      We can use the sorted table giving the number of flights defined as sorted_flights to reorder the carrier.

      +
      ggplot(data = flights, mapping = aes(x = carrier)) +
      +  geom_bar() +
      +  scale_x_discrete(limits = names(sorted_flights))
      +
      +Number of flights departing NYC in 2013 by airline - Descending numbers +

      +Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers +

      +
      +

      The last addition here specifies the values of the horizontal x axis on a discrete scale to correspond to those given by the entries of sorted_flights.

      +
      +
      +

      C.2 Interactive graphics

      +
      +

      C.2.1 Interactive linegraphs

      +

      Another useful tool for viewing linegraphs such as this is the dygraph function in the dygraphs package in combination with the dyRangeSelector function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:

      +
      library(dygraphs)
      +flights_day <- mutate(flights, date = as.Date(time_hour))
      +flights_summarized <- flights_day %>% 
      +  group_by(date) %>%
      +  summarize(median_arr_delay = median(arr_delay, na.rm = TRUE))
      +rownames(flights_summarized) <- flights_summarized$date
      +flights_summarized <- select(flights_summarized, -date)
      +dyRangeSelector(dygraph(flights_summarized))
      +
      + +


      +

      The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.)

      + + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/previous_versions/v0.4.0/data/ageAtMar.csv b/docs/previous_versions/v0.4.0/data/ageAtMar.csv new file mode 100755 index 000000000..b68e12a1b --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/ageAtMar.csv @@ -0,0 +1,5535 @@ +age +32 +25 +24 +26 +32 +29 +23 +23 +29 +27 +23 +21 +29 +40 +22 +20 +31 +21 +25 +24 +23 +31 +30 +34 +29 +29 +35 +25 +22 +26 +26 +21 +25 +22 +19 +24 +22 +26 +27 +29 +25 +22 +29 +21 +23 +28 +25 +29 +27 +25 +21 +20 +31 +33 +32 +24 +33 +24 +22 +33 +33 +25 +24 +23 +20 +27 +28 +23 +27 +25 +22 +29 +27 +19 +27 +17 +29 +23 +27 +24 +27 +31 +30 +19 +24 +28 +15 +19 +26 +22 +19 +19 +22 +26 +24 +20 +30 +18 +22 +21 +17 +27 +27 +31 +34 +32 +22 +28 +25 +24 +18 +23 +17 +16 +23 +24 +21 +30 +23 +24 +26 +25 +22 +27 +19 +24 +24 +22 +26 +22 +23 +33 +28 +20 +24 +20 +26 +28 +24 +19 +41 +34 +23 +33 +39 +27 +23 +26 +18 +28 +34 +31 +18 +18 +22 +21 +29 +29 +27 +30 +25 +27 +23 +29 +22 +23 +22 +22 +26 +18 +22 +33 +19 +21 +29 +29 +18 +27 +27 +23 +20 +25 +25 +25 +23 +28 +30 +20 +28 +27 +29 +27 +29 +40 +23 +29 +32 +24 +23 +27 +29 +23 +23 +19 +17 +25 +37 +23 +42 +35 +24 +26 +27 +28 +30 +21 +18 +19 +19 +18 +22 +19 +28 +32 +20 +29 +31 +29 +31 +31 +24 +25 +24 +27 +24 +21 +29 +24 +31 +26 +19 +41 +26 +24 +23 +20 +26 +18 +27 +28 +22 +18 +26 +32 +28 +23 +32 +34 +28 +25 +25 +19 +25 +33 +25 +23 +17 +36 +29 +25 +20 +31 +28 +29 +18 +28 +20 +25 +28 +27 +25 +21 +30 +22 +25 +23 +25 +27 +27 +23 +25 +25 +21 +29 +22 +36 +22 +24 +24 +30 +25 +22 +21 +19 +27 +25 +29 +19 +25 +28 +25 +20 +30 +27 +23 +24 +30 +25 +18 +22 +30 +25 +25 +23 +29 +25 +30 +19 +22 +19 +24 +19 +25 +22 +37 +34 +35 +22 +26 +18 +31 +22 +21 +29 +22 +33 +26 +20 +24 +24 +15 +23 +28 +24 +21 +23 +26 +19 +25 +25 +17 +18 +19 +21 +29 +22 +19 +29 +23 +22 +28 +26 +23 +20 +24 +27 +25 +34 +24 +30 +22 +30 +20 +28 +27 +27 +27 +21 +23 +29 +23 +31 +20 +18 +22 +20 +28 +20 +22 +21 +21 +31 +29 +24 +31 +22 +19 +19 +18 +23 +19 +21 +29 +27 +20 +37 +27 +32 +22 +20 +21 +23 +21 +21 +23 +19 +19 +38 +21 +25 +21 +33 +23 +23 +18 +33 +27 +21 +20 +25 +18 +20 +19 +26 +20 +26 +20 +25 +16 +16 +22 +24 +22 +24 +21 +24 +34 +19 +29 +24 +23 +23 +21 +21 +23 +31 +17 +28 +19 +17 +24 +27 +23 +22 +21 +34 +17 +30 +41 +28 +27 +28 +24 +19 +20 +28 +29 +20 +25 +16 +21 +26 +31 +20 +18 +19 +21 +16 +21 +31 +21 +22 +18 +23 +33 +36 +26 +26 +18 +24 +25 +25 +28 +22 +20 +24 +20 +26 +21 +26 +26 +24 +38 +22 +25 +21 +35 +20 +18 +22 +23 +32 +27 +20 +13 +20 +26 +20 +21 +20 +37 +26 +22 +26 +29 +21 +22 +25 +27 +21 +29 +19 +32 +27 +31 +32 +20 +22 +22 +29 +17 +25 +26 +22 +26 +21 +33 +26 +21 +24 +22 +28 +26 +25 +26 +22 +24 +27 +33 +23 +26 +21 +26 +22 +26 +21 +19 +28 +30 +22 +20 +27 +26 +23 +29 +23 +26 +20 +26 +30 +18 +26 +15 +25 +25 +23 +30 +28 +18 +21 +25 +26 +26 +25 +29 +17 +22 +25 +30 +31 +24 +38 +25 +20 +26 +30 +23 +24 +25 +26 +21 +29 +29 +18 +22 +21 +29 +22 +20 +19 +20 +19 +25 +25 +18 +19 +19 +21 +24 +21 +22 +23 +19 +22 +29 +25 +26 +24 +26 +29 +22 +21 +21 +20 +25 +26 +26 +21 +24 +40 +20 +19 +16 +29 +24 +25 +22 +28 +23 +17 +16 +23 +20 +21 +23 +33 +20 +27 +19 +24 +26 +22 +26 +22 +29 +29 +25 +21 +24 +32 +42 +22 +26 +24 +19 +21 +22 +21 +27 +37 +22 +20 +31 +24 +20 +25 +22 +23 +20 +24 +25 +33 +20 +21 +19 +28 +24 +19 +19 +31 +28 +20 +27 +18 +22 +20 +22 +25 +28 +26 +20 +20 +37 +21 +19 +32 +29 +19 +19 +34 +21 +27 +29 +23 +21 +23 +26 +30 +26 +35 +23 +18 +19 +33 +30 +21 +26 +25 +35 +30 +29 +25 +20 +21 +25 +26 +25 +39 +25 +24 +26 +23 +29 +26 +22 +25 +25 +20 +22 +26 +22 +29 +26 +25 +25 +29 +19 +27 +40 +21 +25 +29 +21 +23 +26 +19 +33 +17 +28 +25 +24 +26 +18 +31 +21 +24 +26 +24 +25 +32 +18 +22 +25 +26 +23 +20 +20 +29 +30 +28 +23 +28 +27 +21 +19 +23 +21 +21 +23 +23 +18 +26 +22 +20 +20 +15 +24 +20 +32 +32 +28 +23 +26 +32 +19 +28 +24 +30 +17 +27 +26 +26 +27 +33 +36 +22 +22 +30 +26 +23 +20 +18 +21 +17 +19 +31 +21 +19 +30 +24 +25 +28 +21 +25 +26 +30 +23 +28 +27 +32 +25 +20 +20 +37 +23 +28 +24 +35 +25 +26 +21 +30 +23 +32 +21 +24 +20 +33 +20 +18 +28 +27 +19 +29 +19 +14 +20 +21 +17 +28 +21 +21 +31 +25 +27 +29 +29 +18 +29 +25 +27 +25 +21 +24 +18 +22 +22 +17 +29 +21 +33 +29 +21 +27 +18 +22 +18 +18 +20 +23 +19 +21 +21 +23 +19 +27 +22 +24 +17 +17 +27 +29 +22 +26 +33 +20 +21 +21 +27 +21 +25 +22 +17 +27 +21 +18 +30 +31 +23 +26 +19 +21 +30 +28 +19 +27 +23 +23 +19 +23 +13 +28 +23 +18 +17 +25 +25 +25 +23 +26 +21 +23 +35 +20 +23 +23 +21 +25 +16 +23 +17 +24 +20 +24 +17 +22 +28 +24 +25 +18 +23 +28 +19 +29 +27 +26 +25 +23 +28 +21 +26 +25 +29 +25 +22 +28 +23 +25 +22 +25 +19 +19 +20 +16 +24 +20 +18 +30 +19 +29 +23 +31 +15 +25 +15 +29 +23 +24 +14 +19 +31 +28 +16 +17 +28 +21 +18 +22 +20 +23 +18 +24 +25 +14 +18 +23 +19 +17 +20 +14 +22 +16 +21 +19 +17 +21 +27 +24 +19 +17 +21 +17 +25 +26 +18 +24 +26 +24 +22 +25 +24 +21 +16 +29 +18 +26 +28 +28 +19 +29 +21 +31 +30 +23 +19 +18 +23 +21 +22 +23 +21 +21 +16 +35 +21 +20 +20 +25 +30 +29 +29 +18 +18 +25 +22 +19 +30 +25 +26 +18 +26 +23 +25 +24 +18 +19 +22 +19 +28 +18 +24 +19 +22 +35 +14 +23 +19 +24 +25 +20 +28 +32 +17 +28 +29 +25 +29 +24 +25 +21 +25 +22 +24 +24 +20 +24 +21 +23 +26 +22 +20 +22 +27 +22 +21 +23 +26 +25 +22 +26 +24 +19 +34 +20 +27 +23 +20 +27 +27 +19 +25 +22 +31 +33 +19 +20 +31 +21 +17 +17 +21 +23 +18 +27 +33 +27 +23 +26 +27 +22 +18 +19 +25 +19 +24 +17 +25 +19 +15 +20 +24 +23 +19 +25 +27 +27 +22 +29 +21 +20 +32 +28 +29 +21 +26 +16 +17 +19 +21 +15 +22 +23 +22 +22 +25 +23 +21 +37 +18 +21 +29 +20 +17 +19 +22 +27 +23 +25 +22 +19 +15 +22 +19 +28 +24 +23 +20 +21 +22 +30 +24 +20 +24 +20 +18 +27 +22 +22 +40 +18 +25 +25 +23 +19 +18 +26 +24 +14 +21 +19 +18 +22 +10 +23 +18 +18 +21 +30 +18 +20 +31 +19 +17 +18 +25 +19 +22 +31 +19 +27 +17 +23 +27 +24 +17 +19 +19 +25 +21 +28 +23 +20 +23 +22 +24 +23 +37 +25 +21 +24 +18 +23 +16 +21 +17 +26 +30 +28 +23 +30 +26 +24 +20 +22 +30 +25 +15 +21 +25 +26 +26 +20 +22 +41 +20 +25 +42 +22 +25 +16 +19 +20 +23 +29 +23 +21 +23 +24 +23 +19 +22 +23 +18 +21 +18 +21 +20 +20 +19 +23 +25 +19 +25 +16 +22 +29 +19 +29 +30 +27 +22 +20 +18 +26 +20 +20 +18 +19 +31 +21 +31 +22 +20 +25 +18 +19 +19 +27 +28 +29 +25 +18 +22 +18 +22 +25 +20 +18 +20 +19 +22 +19 +18 +23 +20 +20 +24 +27 +22 +21 +31 +18 +33 +16 +23 +21 +23 +21 +20 +27 +19 +21 +28 +30 +18 +17 +28 +18 +31 +21 +32 +22 +16 +26 +26 +18 +19 +15 +22 +21 +19 +19 +19 +22 +15 +26 +29 +23 +20 +20 +20 +24 +21 +23 +32 +31 +21 +31 +28 +33 +26 +20 +24 +24 +25 +31 +24 +32 +28 +20 +22 +18 +25 +16 +18 +12 +22 +29 +31 +20 +21 +20 +23 +24 +29 +19 +20 +18 +21 +20 +22 +21 +20 +28 +22 +17 +24 +30 +17 +27 +33 +26 +22 +18 +25 +22 +32 +22 +18 +22 +27 +24 +30 +30 +25 +25 +23 +19 +28 +19 +16 +20 +25 +29 +18 +33 +25 +28 +32 +34 +32 +25 +24 +28 +26 +27 +29 +30 +20 +24 +28 +27 +24 +27 +24 +23 +26 +26 +26 +26 +26 +24 +22 +30 +21 +22 +22 +34 +25 +27 +21 +14 +30 +27 +32 +28 +20 +24 +22 +23 +15 +20 +25 +22 +22 +25 +27 +30 +26 +23 +23 +15 +21 +18 +21 +23 +22 +24 +29 +24 +19 +29 +18 +21 +20 +18 +23 +24 +19 +21 +28 +20 +30 +19 +23 +24 +24 +25 +23 +22 +29 +26 +26 +23 +25 +18 +41 +19 +28 +18 +19 +38 +19 +33 +30 +18 +16 +23 +29 +31 +19 +22 +18 +21 +17 +22 +26 +26 +20 +25 +28 +24 +23 +28 +17 +22 +25 +25 +22 +19 +18 +17 +25 +23 +32 +22 +17 +30 +27 +16 +21 +21 +26 +17 +22 +22 +31 +23 +22 +25 +30 +23 +25 +20 +25 +21 +21 +19 +16 +24 +28 +19 +19 +39 +21 +32 +25 +26 +40 +16 +25 +21 +19 +20 +20 +25 +20 +22 +25 +22 +16 +17 +19 +15 +30 +18 +21 +30 +21 +35 +20 +21 +26 +30 +17 +23 +19 +25 +19 +17 +32 +25 +32 +19 +25 +25 +27 +19 +20 +28 +25 +28 +21 +29 +23 +25 +17 +17 +24 +37 +28 +22 +25 +31 +25 +24 +19 +22 +28 +24 +24 +18 +29 +22 +27 +26 +27 +19 +35 +17 +20 +18 +19 +27 +22 +29 +26 +30 +27 +19 +22 +25 +22 +29 +28 +25 +29 +38 +23 +35 +25 +32 +23 +30 +35 +24 +26 +22 +16 +23 +26 +21 +23 +19 +28 +16 +22 +26 +26 +26 +31 +19 +21 +20 +17 +26 +20 +28 +25 +27 +20 +25 +32 +32 +19 +15 +24 +22 +20 +22 +26 +23 +21 +18 +22 +27 +19 +21 +24 +20 +25 +22 +21 +20 +18 +20 +18 +20 +27 +18 +18 +23 +16 +18 +22 +22 +19 +18 +22 +26 +19 +35 +31 +22 +15 +26 +21 +25 +38 +15 +19 +26 +18 +24 +22 +21 +18 +19 +25 +27 +20 +19 +22 +23 +23 +28 +29 +16 +23 +27 +19 +23 +23 +25 +22 +18 +25 +20 +24 +20 +27 +30 +24 +17 +18 +18 +22 +25 +23 +22 +18 +23 +21 +27 +28 +20 +29 +23 +18 +34 +24 +36 +25 +28 +35 +23 +23 +31 +23 +27 +23 +19 +20 +26 +24 +34 +29 +21 +27 +23 +27 +23 +21 +22 +21 +37 +23 +24 +29 +30 +26 +26 +21 +29 +23 +26 +21 +19 +21 +34 +25 +19 +29 +19 +22 +29 +24 +25 +28 +14 +25 +33 +21 +25 +20 +26 +20 +24 +18 +26 +27 +16 +23 +21 +22 +28 +27 +19 +18 +27 +28 +19 +27 +22 +22 +27 +26 +24 +29 +28 +22 +27 +25 +23 +19 +25 +23 +26 +22 +23 +24 +24 +18 +32 +23 +27 +23 +16 +24 +27 +28 +20 +24 +30 +18 +27 +21 +25 +28 +24 +34 +31 +23 +37 +35 +38 +35 +20 +28 +31 +27 +35 +22 +22 +33 +29 +26 +25 +19 +23 +28 +19 +37 +22 +23 +26 +21 +18 +18 +37 +29 +20 +22 +24 +25 +19 +27 +32 +27 +32 +26 +31 +37 +23 +22 +24 +32 +29 +29 +25 +23 +31 +20 +17 +16 +26 +21 +21 +28 +23 +29 +24 +17 +17 +24 +22 +21 +24 +20 +23 +22 +26 +21 +21 +24 +26 +25 +37 +24 +35 +23 +21 +24 +22 +28 +25 +25 +20 +22 +21 +21 +17 +36 +26 +28 +16 +17 +20 +14 +21 +28 +18 +25 +29 +31 +34 +28 +24 +34 +29 +18 +29 +25 +18 +19 +26 +22 +22 +20 +30 +24 +22 +29 +23 +21 +19 +26 +29 +26 +26 +18 +24 +21 +27 +22 +25 +29 +31 +23 +29 +19 +23 +29 +25 +26 +24 +23 +21 +30 +29 +22 +19 +23 +32 +34 +30 +27 +23 +22 +20 +24 +29 +21 +21 +19 +24 +24 +24 +27 +27 +19 +22 +20 +34 +22 +19 +21 +30 +18 +23 +22 +21 +25 +18 +30 +28 +17 +24 +21 +23 +23 +18 +19 +27 +23 +24 +25 +16 +19 +29 +21 +36 +28 +21 +25 +19 +26 +25 +26 +26 +20 +21 +20 +23 +26 +26 +14 +21 +29 +27 +25 +27 +22 +26 +20 +26 +19 +29 +29 +34 +25 +22 +25 +18 +22 +21 +23 +25 +32 +23 +34 +24 +23 +24 +30 +30 +27 +28 +32 +29 +41 +25 +26 +23 +30 +20 +26 +19 +22 +26 +26 +22 +21 +18 +24 +22 +20 +27 +30 +23 +24 +18 +19 +21 +35 +16 +21 +23 +24 +22 +24 +25 +25 +18 +30 +16 +21 +22 +33 +20 +17 +23 +26 +24 +23 +27 +28 +27 +25 +27 +30 +25 +29 +30 +30 +24 +21 +26 +20 +34 +21 +22 +22 +25 +17 +29 +22 +25 +21 +25 +23 +23 +21 +23 +23 +39 +25 +22 +26 +20 +19 +18 +24 +19 +20 +24 +19 +33 +25 +24 +21 +20 +25 +26 +21 +21 +29 +35 +30 +20 +22 +28 +20 +20 +37 +26 +26 +21 +23 +31 +18 +23 +21 +19 +22 +21 +16 +22 +28 +20 +21 +20 +24 +25 +20 +19 +17 +17 +20 +18 +23 +21 +25 +21 +19 +31 +30 +33 +16 +34 +34 +18 +25 +28 +18 +20 +25 +23 +26 +23 +22 +29 +24 +22 +21 +25 +25 +20 +24 +23 +25 +22 +19 +24 +27 +25 +22 +35 +29 +36 +29 +25 +21 +24 +29 +28 +30 +25 +26 +26 +30 +34 +24 +18 +25 +23 +39 +21 +20 +31 +20 +21 +28 +25 +23 +29 +19 +28 +27 +19 +18 +30 +25 +29 +18 +19 +21 +22 +21 +24 +38 +20 +23 +25 +19 +19 +18 +25 +26 +27 +19 +24 +25 +29 +18 +18 +25 +24 +18 +19 +24 +22 +25 +22 +19 +20 +20 +35 +23 +17 +25 +27 +28 +16 +32 +28 +18 +25 +18 +22 +19 +18 +27 +22 +30 +19 +23 +19 +20 +28 +24 +16 +19 +21 +24 +15 +18 +30 +20 +24 +27 +23 +26 +18 +20 +18 +21 +18 +27 +23 +29 +23 +28 +28 +27 +27 +28 +20 +22 +30 +23 +19 +19 +19 +25 +30 +24 +21 +19 +29 +20 +28 +22 +20 +19 +27 +22 +28 +25 +24 +35 +26 +26 +23 +26 +20 +24 +26 +28 +31 +26 +22 +22 +27 +26 +21 +23 +24 +27 +23 +22 +24 +21 +27 +21 +18 +17 +27 +18 +23 +28 +18 +25 +20 +20 +28 +20 +25 +26 +21 +26 +26 +23 +27 +21 +21 +33 +23 +22 +24 +21 +24 +24 +23 +25 +26 +23 +25 +20 +20 +26 +21 +26 +25 +22 +21 +28 +29 +20 +25 +21 +26 +23 +27 +24 +16 +31 +21 +23 +28 +23 +22 +27 +24 +29 +31 +30 +30 +19 +21 +21 +20 +28 +20 +29 +22 +26 +26 +28 +21 +31 +25 +23 +32 +26 +30 +28 +33 +18 +33 +22 +16 +32 +26 +23 +33 +28 +24 +25 +28 +30 +24 +21 +24 +22 +23 +23 +26 +22 +29 +28 +24 +30 +22 +22 +26 +33 +19 +24 +22 +19 +19 +25 +28 +19 +21 +30 +18 +30 +22 +20 +31 +28 +22 +27 +22 +20 +21 +22 +23 +22 +39 +21 +23 +23 +28 +20 +35 +20 +19 +35 +26 +19 +35 +25 +25 +25 +30 +22 +18 +27 +20 +25 +24 +19 +29 +18 +30 +25 +24 +22 +25 +25 +35 +25 +22 +20 +31 +26 +27 +25 +25 +21 +22 +22 +22 +25 +26 +23 +20 +28 +23 +30 +18 +25 +26 +22 +19 +19 +20 +34 +25 +22 +23 +16 +29 +35 +23 +26 +19 +38 +20 +25 +19 +24 +19 +19 +29 +21 +20 +29 +21 +23 +29 +19 +20 +22 +28 +17 +23 +24 +32 +21 +25 +15 +25 +22 +24 +27 +20 +28 +24 +36 +19 +33 +19 +31 +21 +34 +26 +23 +19 +30 +32 +25 +28 +24 +23 +31 +25 +25 +22 +27 +24 +20 +23 +23 +23 +26 +23 +26 +18 +23 +24 +24 +19 +25 +31 +23 +19 +23 +25 +22 +24 +21 +22 +22 +17 +23 +20 +24 +24 +22 +20 +26 +24 +20 +22 +27 +39 +20 +14 +22 +25 +21 +30 +19 +34 +24 +23 +18 +20 +21 +14 +27 +27 +20 +30 +20 +34 +25 +28 +24 +32 +26 +23 +20 +30 +25 +21 +19 +24 +24 +19 +20 +19 +21 +27 +21 +24 +24 +25 +27 +23 +17 +30 +39 +21 +22 +21 +24 +20 +25 +19 +23 +22 +19 +16 +20 +21 +31 +24 +24 +22 +29 +21 +22 +18 +19 +21 +25 +21 +29 +24 +22 +18 +24 +28 +19 +21 +19 +28 +25 +22 +24 +21 +27 +20 +28 +18 +18 +23 +25 +22 +22 +21 +26 +28 +26 +25 +20 +24 +20 +19 +23 +21 +30 +42 +29 +39 +34 +21 +18 +24 +25 +29 +40 +23 +24 +23 +22 +17 +17 +21 +26 +35 +22 +18 +26 +20 +22 +25 +19 +21 +27 +20 +22 +19 +21 +29 +20 +23 +17 +25 +16 +19 +21 +20 +36 +21 +17 +18 +23 +21 +21 +19 +22 +22 +21 +25 +25 +18 +27 +20 +24 +31 +19 +27 +21 +19 +19 +26 +23 +24 +24 +32 +25 +20 +27 +22 +15 +26 +29 +18 +20 +27 +22 +21 +23 +19 +25 +24 +33 +29 +33 +24 +27 +18 +28 +24 +27 +29 +33 +21 +19 +27 +19 +19 +24 +22 +12 +21 +22 +22 +30 +28 +28 +25 +23 +31 +33 +25 +27 +26 +21 +29 +21 +23 +17 +23 +24 +16 +22 +20 +21 +19 +22 +26 +30 +27 +35 +26 +22 +33 +25 +19 +21 +20 +34 +25 +26 +26 +19 +26 +24 +23 +27 +18 +22 +25 +19 +27 +25 +26 +26 +29 +23 +22 +23 +28 +21 +17 +19 +20 +17 +22 +26 +18 +19 +26 +18 +20 +22 +24 +23 +19 +19 +24 +25 +25 +26 +21 +17 +20 +17 +20 +22 +19 +19 +18 +27 +26 +21 +33 +25 +31 +16 +28 +22 +26 +18 +19 +23 +18 +24 +29 +30 +25 +16 +26 +20 +23 +23 +20 +16 +26 +33 +19 +22 +17 +20 +21 +18 +26 +19 +24 +24 +26 +23 +21 +20 +18 +19 +27 +21 +28 +24 +31 +20 +22 +21 +18 +29 +26 +18 +20 +22 +22 +22 +20 +22 +28 +33 +19 +26 +20 +25 +19 +34 +18 +19 +18 +19 +29 +16 +27 +19 +19 +26 +24 +28 +29 +22 +19 +20 +28 +22 +16 +20 +20 +28 +26 +29 +25 +23 +18 +19 +18 +15 +29 +20 +33 +27 +19 +29 +28 +26 +20 +21 +22 +32 +18 +24 +19 +20 +21 +20 +28 +21 +31 +24 +40 +21 +15 +26 +26 +26 +19 +21 +22 +18 +23 +21 +20 +21 +20 +20 +35 +26 +30 +19 +27 +19 +20 +29 +24 +17 +22 +16 +34 +25 +24 +20 +24 +27 +21 +30 +26 +38 +38 +18 +18 +32 +19 +24 +20 +15 +24 +30 +25 +15 +28 +20 +27 +24 +31 +15 +19 +31 +20 +35 +18 +20 +19 +25 +16 +24 +22 +28 +30 +21 +24 +20 +20 +19 +34 +23 +16 +21 +16 +19 +20 +21 +27 +21 +22 +20 +22 +19 +23 +30 +24 +31 +31 +22 +19 +33 +23 +22 +19 +16 +18 +22 +17 +20 +18 +31 +25 +23 +19 +18 +19 +21 +21 +27 +22 +19 +29 +20 +20 +16 +18 +24 +25 +19 +31 +19 +22 +19 +16 +17 +23 +21 +22 +27 +16 +23 +20 +20 +20 +19 +19 +24 +18 +21 +23 +20 +21 +20 +20 +18 +20 +20 +22 +18 +33 +18 +15 +17 +21 +16 +36 +31 +17 +21 +17 +18 +17 +19 +23 +18 +17 +20 +16 +23 +20 +24 +18 +14 +17 +14 +17 +23 +18 +24 +23 +25 +20 +18 +15 +22 +26 +17 +19 +18 +34 +24 +18 +27 +20 +20 +19 +19 +20 +19 +25 +27 +23 +28 +22 +27 +24 +26 +15 +26 +28 +24 +33 +24 +23 +16 +30 +21 +22 +26 +23 +18 +28 +26 +31 +22 +27 +21 +20 +19 +29 +16 +24 +26 +21 +25 +19 +26 +29 +28 +24 +29 +28 +21 +17 +22 +26 +19 +34 +26 +19 +29 +24 +30 +16 +24 +25 +22 +24 +19 +22 +21 +23 +30 +20 +22 +27 +27 +28 +23 +24 +17 +31 +25 +25 +25 +22 +23 +17 +25 +29 +33 +19 +24 +33 +18 +27 +30 +15 +30 +17 +21 +25 +18 +28 +22 +23 +20 +18 +19 +32 +24 +25 +23 +26 +30 +24 +25 +25 +20 +24 +19 +22 +31 +26 +28 +28 +24 +19 +26 +18 +25 +17 +34 +19 +28 +20 +21 +21 +18 +18 +19 +21 +34 +20 +24 +16 +20 +22 +22 +21 +24 +23 +20 +19 +17 +19 +21 +33 +25 +18 +17 +29 +27 +27 +33 +22 +22 +23 +13 +25 +24 +21 +21 +32 +20 +21 +28 +20 +29 +25 +25 +28 +34 +26 +25 +24 +21 +25 +20 +21 +27 +27 +18 +23 +14 +27 +22 +24 +21 +26 +24 +23 +19 +20 +22 +22 +20 +30 +23 +28 +19 +21 +23 +26 +19 +27 +27 +22 +24 +25 +36 +19 +34 +35 +26 +21 +23 +33 +20 +23 +26 +21 +19 +24 +20 +28 +21 +37 +26 +21 +18 +20 +18 +43 +25 +19 +28 +19 +20 +25 +20 +21 +15 +21 +20 +21 +19 +29 +22 +22 +18 +20 +29 +29 +23 +27 +21 +20 +18 +35 +25 +23 +24 +18 +20 +19 +18 +16 +37 +26 +24 +33 +35 +23 +20 +22 +14 +24 +19 +19 +18 +29 +15 +17 +37 +22 +25 +19 +20 +32 +21 +19 +29 +21 +23 +16 +24 +20 +22 +18 +18 +19 +23 +39 +21 +19 +22 +24 +25 +28 +18 +16 +18 +21 +21 +18 +18 +20 +24 +23 +15 +19 +19 +22 +23 +27 +26 +25 +24 +22 +18 +17 +18 +26 +18 +24 +18 +23 +20 +24 +24 +21 +27 +27 +35 +24 +25 +23 +20 +24 +20 +25 +21 +24 +23 +25 +21 +20 +21 +20 +32 +24 +18 +28 +16 +19 +18 +23 +24 +25 +20 +23 +20 +29 +23 +18 +21 +21 +23 +23 +21 +22 +22 +21 +20 +28 +21 +22 +21 +21 +24 +20 +28 +17 +21 +18 +20 +19 +20 +23 +33 +19 +18 +25 +23 +24 +19 +23 +25 +21 +26 +27 +19 +28 +20 +34 +25 +20 +19 +22 +22 +30 +21 +24 +18 +20 +15 +19 +23 +24 +36 +18 +27 +21 +17 +21 +26 +18 +24 +31 +14 +30 +26 +23 +19 +16 +23 +19 +20 +28 +23 +23 +33 +34 +32 +21 +20 +18 +25 +26 +24 +27 +17 +31 +38 +22 +31 +20 +25 +23 +15 +24 +21 +20 +19 +15 +23 +24 +28 +20 +28 +27 +19 +24 +25 +19 +25 +29 +25 +22 +21 +26 +21 +25 +21 +18 +27 +25 +23 +22 +23 +23 +24 +22 +21 +22 +20 +23 +25 +23 +21 +20 +21 +21 +22 +26 +25 +18 +18 +25 +24 +20 +26 +21 +20 +23 +20 +20 +17 +17 +19 +23 +20 +19 +19 +21 +20 +26 +22 +22 +24 +28 +22 +25 +22 +19 +20 +21 +21 +21 +22 +28 +20 +21 +25 +22 +24 +30 +21 +19 +21 +24 +27 +21 +19 +22 +15 +18 +20 +21 +19 +22 +16 +21 +18 +23 +19 +19 +21 +24 +27 +21 +27 +24 +31 +20 +26 +20 +21 +18 +24 +21 +24 +19 +18 +24 +23 +33 +25 +22 +30 +28 +21 +29 +25 +25 +29 +27 +25 +27 +27 +25 +27 +27 +20 +22 +27 +36 +19 +16 +24 +18 +27 +26 +19 +23 +22 +22 +24 +24 +22 +26 +29 +23 +25 +25 +21 +24 +21 +24 +22 +17 +24 +26 +25 +19 +21 +21 +20 +20 +22 +24 +30 +26 +24 +29 +22 +28 +27 +32 +23 +19 +24 +28 +30 +25 +33 +30 +21 +18 +29 +32 +28 +34 +21 +22 +30 +21 +24 +25 +33 +18 +23 +24 +34 +26 +25 +22 +23 +26 +33 +27 +24 +25 +22 +29 +19 +26 +22 +23 +19 +18 +15 +20 +24 +18 +18 +21 +18 +18 +18 +19 +17 +31 +20 +16 +24 +20 +25 +25 +22 +18 +18 +26 +23 +40 +20 +19 +21 +19 +21 +23 +19 +25 +20 +22 +24 +20 +23 +29 +20 +23 +23 +19 +23 +25 +23 +24 +25 +22 +28 +23 +28 +23 +16 +24 +23 +20 +27 +25 +20 +25 +30 +31 +23 +19 +29 +18 +25 +22 +22 +20 +13 +38 +18 +22 +19 +20 +18 +28 +16 +25 +19 +24 +21 +21 +19 +18 +21 +21 +18 +21 +24 +17 +21 +20 +19 +19 +18 +24 +18 +25 +28 +18 +27 +19 +27 +19 +31 +19 +28 +21 +17 +29 +21 +18 +26 +24 +31 +25 +23 +27 +22 +26 +27 +23 +20 +20 +27 +29 +21 +23 +35 +27 +19 +31 +34 +19 +23 +26 +27 +17 +19 +18 +19 +19 +20 +23 +24 +20 +21 +17 +18 +23 +21 +21 +24 +16 +19 +19 +16 +21 +17 +24 +19 +16 +21 +16 +22 +25 +42 +25 +22 +16 +25 +17 +23 +30 +31 +23 +26 +24 +18 +23 +28 +21 +21 +18 +19 +27 +21 +18 +24 +14 +21 +26 +28 +18 +19 +18 +36 +22 +21 +17 +18 +30 +21 +22 +23 +20 +21 +22 +26 +25 +22 +29 +21 +23 +18 +18 +25 +23 +19 +18 +29 +27 +22 +26 +26 +17 +26 +22 +30 +26 +16 +28 +26 +20 +19 +18 +23 +22 +35 +26 +21 +22 +23 +24 +23 +20 +22 +25 +21 +24 +33 +18 +22 +25 +33 +19 +20 +24 +24 +24 +28 +20 +32 +21 +23 +26 +25 +24 +23 +24 +30 +22 +28 +30 +19 +30 +23 +28 +20 +24 +28 +19 +22 +18 +24 +25 +22 +30 +24 +24 +19 +30 +27 +23 +32 +23 +29 +25 +17 +19 +18 +19 +18 +24 +22 +28 +24 +21 +27 +22 +23 +28 +24 +18 +23 +20 +22 +22 +17 +23 +23 +28 +22 +20 +24 +24 +24 +22 +26 +26 +33 +20 +21 +30 +26 +26 +21 +19 +20 +24 +34 +21 +18 +19 +23 +26 +29 +19 +25 +21 +22 +26 +28 +27 +27 +19 +22 +24 +20 +25 +18 +21 +21 +20 +19 +20 +26 +24 +20 +18 +27 +19 +21 +24 +23 +21 +27 +20 +26 +21 +18 +20 +23 +23 +24 +29 +20 +21 +18 +25 +22 +29 +18 +19 +30 +18 +25 +20 +22 +24 +27 +25 +25 +22 +18 +17 +19 +27 +28 +26 +20 +22 +24 +23 +23 +25 +20 +23 +27 +20 +24 +23 +25 +24 +19 +18 +22 +24 +23 +15 +19 +18 +22 +16 +18 +35 +22 +22 +20 +25 +20 +20 +25 +22 +37 +21 +18 +19 +18 +18 +27 +21 +24 +20 +20 +19 +22 +22 +23 +20 +18 +19 +22 +25 +25 +25 +20 +18 +20 +24 +21 +18 +19 +19 +21 +19 +20 +27 +27 +23 +24 +22 +19 +20 +22 +18 +19 +29 +16 +38 +24 +19 +23 +14 +36 +25 +19 +23 +30 +26 +28 +26 +26 +15 +22 +21 +20 +22 +21 +22 +19 +28 +18 +33 +25 +16 +24 +19 +24 +20 +24 +21 +25 +21 +20 +28 +19 +21 +24 +18 +18 +31 +18 +20 +19 +23 +19 +23 +25 +20 +24 +20 +21 +26 +22 +22 +25 +24 +21 +23 +25 +24 +18 +23 +25 +18 +26 +24 +21 +25 +23 +22 +28 +21 +24 +20 +26 +25 +19 +20 +24 +16 +25 +26 +31 +26 +20 +29 +23 +19 +24 +27 +22 +27 +23 +22 +24 +20 +19 +26 +23 +21 +19 +20 +31 +17 +18 +21 +17 +22 +22 +26 +26 +22 +18 +15 +19 +26 +23 +20 +15 +23 +18 +22 +21 +21 +21 +27 +19 +20 +28 +21 +39 +26 +22 +20 +24 +20 +20 +28 +30 +18 +22 +28 +20 +19 +19 +20 +27 +18 +24 +21 +20 +20 +32 +20 +22 +18 +22 +18 +30 +17 +17 +20 +23 +17 +24 +24 +16 +20 +20 +24 +26 +22 +19 +21 +28 +21 +26 +26 +17 +27 +26 +19 +33 +22 +18 +21 +21 +24 +16 +20 +22 +14 +22 +21 +21 +19 +24 +39 +20 +16 +25 +20 +26 +29 +23 +29 +26 +20 +20 +36 +30 +24 +23 +30 +27 +29 +26 +25 +23 +24 +28 +27 +18 +32 +18 +23 +19 +21 +21 +17 +27 +19 +26 +24 +21 +21 +27 +23 +23 +23 +23 +25 +21 +27 +20 +23 +21 +27 +20 +23 +23 +18 +16 +19 +19 +37 +19 +23 +22 +27 +26 +19 +22 +24 +19 +16 +17 +20 +22 +23 +18 +24 +19 +17 +29 +25 +21 +23 +23 +20 +19 +17 +21 +15 +24 +25 +18 +20 +23 +20 +22 +19 +27 +15 +24 +19 +16 +19 +16 +15 +14 +18 +16 +19 +17 +19 +18 +16 +18 +21 +18 +42 +20 +17 +17 +19 +18 +28 +16 +31 +29 +26 +28 +18 +17 +17 +17 +30 +23 +25 +19 +20 +19 +20 +20 +25 +26 +20 +24 +18 +27 +25 +20 +20 +22 +19 +25 +30 +22 +17 +19 +19 +21 +36 +17 +25 +17 +13 +20 +28 +21 +21 +26 +40 +24 +25 +33 +23 +35 +23 +19 +22 +18 +23 +27 +31 +19 +23 +27 +22 +18 +19 +18 +22 +21 +22 +37 +19 +22 +25 +27 +38 +33 +19 +23 +17 +41 +20 +20 +21 +34 +20 +20 +20 +15 +20 +30 +23 +16 +28 +18 +21 +16 +18 +18 +18 +26 +18 +18 +21 +20 +21 +18 +20 +17 +21 +21 +18 +22 +15 +22 +18 +22 +20 +24 +20 +17 +29 +25 +18 +23 +21 +18 +18 +21 +18 +23 +25 +20 +20 +20 +17 +20 +25 +18 +25 +24 +18 +20 +19 +27 +28 +21 +22 +28 +16 +17 +16 +19 +17 +29 +21 +22 +21 +18 +22 +27 +26 +22 +20 +20 +24 +19 +22 +18 +32 +21 +19 +21 +15 +28 +20 +25 +19 +24 +19 +19 +33 +39 +18 +21 +25 +19 +19 +23 +21 +29 +19 +24 +22 +25 +21 +18 +24 +18 +21 +20 +18 +23 +33 +21 +19 +18 +26 +21 +17 +18 +34 +18 +21 +18 +19 +17 +32 +24 +21 +24 +20 +18 +22 +20 +17 +23 +21 +19 +26 +23 +26 +21 +23 +15 +21 +17 +28 +20 +28 +20 +22 +22 +24 +17 +32 +24 +16 +24 +23 +20 +27 +22 +42 +28 +18 +31 +22 +22 +19 +19 +22 +32 +15 +27 +23 +23 +18 +18 +22 +25 +20 +22 +22 +17 +21 +17 +20 +15 +19 +18 +26 +25 +18 +24 +26 +22 +18 +22 +17 +31 +18 +21 +31 +20 +26 +27 +25 +26 +27 +19 +18 +24 +18 +22 +23 +28 +28 +23 +26 +29 +28 +18 +20 +20 +15 +18 +23 +26 +20 +20 +23 +26 +19 +19 +20 +25 +21 +21 +24 +19 +20 +16 +14 +24 +19 +28 +20 +25 +31 +21 +22 +23 +19 +24 +19 +20 +19 +20 +22 +22 +27 +22 +26 +22 +14 +19 +18 +20 +27 +20 +20 +21 +21 +24 +24 +16 +25 +27 +22 +21 +31 +26 +20 +17 +21 +20 +19 +19 +21 +16 +21 +33 +22 +19 +25 +23 +23 +21 +22 +27 +20 +21 +23 +17 +23 +18 +28 +25 +23 +31 +35 +23 +20 +18 +24 +31 +19 +32 +19 +30 +19 +26 +19 +22 +16 +19 +21 +21 +40 +23 +26 +17 +20 +17 +31 +21 +22 +22 +18 +17 +22 +24 +25 +25 +23 +24 +23 +30 +21 +25 +32 +23 +27 +26 +22 +25 +34 +16 +22 +22 +18 +23 +23 +20 +28 +26 +26 +19 +34 +22 +28 +19 +24 +21 +19 diff --git a/docs/previous_versions/v0.4.0/data/cleSac.txt b/docs/previous_versions/v0.4.0/data/cleSac.txt new file mode 100755 index 000000000..20e7da082 --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/cleSac.txt @@ -0,0 +1 @@ +Census_year State_FIPS_code Metropolitan_area_Detailed Age Sex Race_General Marital_status Total_personal_income 2000 California Sacramento_ CA 56 Male Japanese Married_ spouse present 40240 2000 California Sacramento_ CA 53 Female White Married_ spouse present 13600 2000 California Sacramento_ CA 17 Female Two major races Never married/single (N/A) 0 2000 California Sacramento_ CA 37 Female White Never married/single (N/A) 49000 2000 California Sacramento_ CA 40 Male White Never married/single (N/A) 38300 2000 California Sacramento_ CA 23 Male Other race_ nec Never married/single (N/A) 14000 2000 California Sacramento_ CA 40 Female Black/Negro Divorced 9000 2000 California Sacramento_ CA 11 Male Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 46 Male Black/Negro Married_ spouse present 40000 2000 California Sacramento_ CA 34 Female Black/Negro Married_ spouse present 18000 2000 California Sacramento_ CA 16 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 11 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 7 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 23 Male White Never married/single (N/A) 65000 2000 California Sacramento_ CA 30 Female White Divorced 30000 2000 California Sacramento_ CA 35 Male White Married_ spouse present 61100 2000 California Sacramento_ CA 30 Male White Married_ spouse present 62000 2000 California Sacramento_ CA 28 Female White Married_ spouse present 5500 2000 California Sacramento_ CA 3 Female White Never married/single (N/A) 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 42 Male White Married_ spouse present 36000 2000 California Sacramento_ CA 17 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 6 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 40 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 37 Female White Married_ spouse present 70000 2000 California Sacramento_ CA 9 Male White Never married/single (N/A) 2000 California Sacramento_ CA 7 Male White Never married/single (N/A) 2000 California Sacramento_ CA 39 Male White Divorced 34400 2000 California Sacramento_ CA 33 Male Other Asian or Pacific Islander Married_ spouse present 18000 2000 California Sacramento_ CA 37 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 62 Male Other Asian or Pacific Islander Married_ spouse present 3800 2000 California Sacramento_ CA 27 Male Other race_ nec Married_ spouse absent 15000 2000 California Sacramento_ CA 11 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 21 Female Other race_ nec Married_ spouse absent 0 2000 California Sacramento_ CA 5 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 4 Female Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 80 Female White Widowed 55100 2000 California Sacramento_ CA 28 Female Other Asian or Pacific Islander Married_ spouse present 27000 2000 California Sacramento_ CA 0 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 85 Female White Widowed 13900 2000 California Sacramento_ CA 24 Female White Never married/single (N/A) 20000 2000 California Sacramento_ CA 45 Female Black/Negro Divorced 150000 2000 California Sacramento_ CA 52 Female White Divorced 8300 2000 California Sacramento_ CA 23 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 16 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 43 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 62 Male White Married_ spouse present 42000 2000 California Sacramento_ CA 60 Female White Divorced 1400 2000 California Sacramento_ CA 52 Male White Married_ spouse present 70000 2000 California Sacramento_ CA 51 Female White Married_ spouse present 35000 2000 California Sacramento_ CA 49 Female White Divorced 66000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 13500 2000 California Sacramento_ CA 4 Female White Never married/single (N/A) 2000 California Sacramento_ CA 2 Female White Never married/single (N/A) 2000 California Sacramento_ CA 49 Female Other Asian or Pacific Islander Married_ spouse present 5100 2000 California Sacramento_ CA 51 Male Other Asian or Pacific Islander Married_ spouse present 8100 2000 California Sacramento_ CA 19 Female Other Asian or Pacific Islander Never married/single (N/A) 8000 2000 California Sacramento_ CA 25 Male Other Asian or Pacific Islander Married_ spouse present 32000 2000 California Sacramento_ CA 55 Female White Married_ spouse present 51800 2000 California Sacramento_ CA 39 Female White Never married/single (N/A) 25000 2000 California Sacramento_ CA 39 Male White Married_ spouse absent 95000 2000 California Sacramento_ CA 25 Female American Indian or Alaska Native Never married/single (N/A) 32000 2000 California Sacramento_ CA 24 Female White Married_ spouse present 0 2000 California Sacramento_ CA 4 Male White Never married/single (N/A) 2000 California Sacramento_ CA 77 Male White Married_ spouse present 55000 2000 California Sacramento_ CA 63 Female Two major races Married_ spouse present 51000 2000 California Sacramento_ CA 33 Male White Married_ spouse present 20000 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 4 Male White Never married/single (N/A) 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 35 Male Black/Negro Divorced 850 2000 California Sacramento_ CA 44 Male White Married_ spouse present 80000 2000 California Sacramento_ CA 44 Female White Married_ spouse present 44000 2000 California Sacramento_ CA 18 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 19 Male Two major races Never married/single (N/A) 3750 2000 California Sacramento_ CA 37 Male Black/Negro Married_ spouse present 20000 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 30 Male White Married_ spouse present 36000 2000 California Sacramento_ CA 39 Male White Married_ spouse absent 55000 2000 California Sacramento_ CA 41 Female White Married_ spouse absent 0 2000 California Sacramento_ CA 36 Female White Never married/single (N/A) 32000 2000 California Sacramento_ CA 33 Female White Divorced 36000 2000 California Sacramento_ CA 18 Male Other race_ nec Never married/single (N/A) 2010 2000 California Sacramento_ CA 2 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 49 Male White Married_ spouse present 76300 2000 California Sacramento_ CA 46 Female White Married_ spouse present 41000 2000 California Sacramento_ CA 20 Female Black/Negro Never married/single (N/A) 10000 2000 California Sacramento_ CA 35 Male White Divorced 9600 2000 California Sacramento_ CA 59 Male White Divorced 54000 2000 California Sacramento_ CA 44 Female White Never married/single (N/A) 29000 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 51 Male Japanese Married_ spouse present 12000 2000 California Sacramento_ CA 19 Male White Never married/single (N/A) 2000 2000 California Sacramento_ CA 16 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 54 Female White Married_ spouse present 39400 2000 California Sacramento_ CA 51 Female White Married_ spouse present 0 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 30 Female White Married_ spouse present 40000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 30000 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 63 Female White Married_ spouse present 22100 2000 California Sacramento_ CA 46 Female White Divorced 17900 2000 California Sacramento_ CA 26 Male White Never married/single (N/A) 20000 2000 California Sacramento_ CA 46 Female Black/Negro Divorced 23000 2000 California Sacramento_ CA 24 Male Black/Negro Never married/single (N/A) 25000 2000 California Sacramento_ CA 80 Male White Married_ spouse absent 12000 2000 California Sacramento_ CA 36 Male White Married_ spouse present 10900 2000 California Sacramento_ CA 29 Male White Married_ spouse absent 160000 2000 California Sacramento_ CA 64 Male White Divorced 14000 2000 California Sacramento_ CA 27 Female White Married_ spouse present 19600 2000 California Sacramento_ CA 29 Male White Married_ spouse present 68000 2000 California Sacramento_ CA 93 Male White Married_ spouse present 39300 2000 California Sacramento_ CA 22 Male Other Asian or Pacific Islander Never married/single (N/A) 12000 2000 California Sacramento_ CA 23 Male Other Asian or Pacific Islander Never married/single (N/A) 6700 2000 California Sacramento_ CA 38 Male White Divorced 50000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 52490 2000 California Sacramento_ CA 39 Male White Married_ spouse present 62400 2000 California Sacramento_ CA 11 Male White Never married/single (N/A) 2000 California Sacramento_ CA 25 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 8 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 44 Male Two major races Married_ spouse present 16000 2000 California Sacramento_ CA 39 Female Two major races Married_ spouse present 6900 2000 California Sacramento_ CA 21 Male Other race_ nec Never married/single (N/A) 4000 2000 California Sacramento_ CA 20 Male White Never married/single (N/A) 13000 2000 California Sacramento_ CA 17 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 12 Female Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 21 Male Other race_ nec Never married/single (N/A) 16700 2000 California Sacramento_ CA 37 Female Black/Negro Separated 24900 2000 California Sacramento_ CA 33 Male Black/Negro Never married/single (N/A) 16100 2000 California Sacramento_ CA 15 Male Black/Negro Never married/single (N/A) 7100 2000 California Sacramento_ CA 7 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 12 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 4 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 1 Male White Never married/single (N/A) 2000 California Sacramento_ CA 16 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 15 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 65 Male White Married_ spouse present 44800 2000 California Sacramento_ CA 71 Female White Married_ spouse present 16000 2000 California Sacramento_ CA 71 Male White Married_ spouse present 86700 2000 California Sacramento_ CA 65 Female White Married_ spouse present 3000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 12500 2000 California Sacramento_ CA 12 Female White Never married/single (N/A) 2000 California Sacramento_ CA 52 Female White Married_ spouse present 16600 2000 California Sacramento_ CA 40 Female Two major races Married_ spouse present 170000 2000 California Sacramento_ CA 46 Male Two major races Married_ spouse present 70000 2000 California Sacramento_ CA 31 Male White Married_ spouse present 32000 2000 California Sacramento_ CA 8 Male White Never married/single (N/A) 2000 California Sacramento_ CA 36 Male Other race_ nec Married_ spouse present 40040 2000 California Sacramento_ CA 8 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 45 Male White Married_ spouse present 10000 2000 California Sacramento_ CA 36 Female White Married_ spouse present 34000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 600 2000 California Sacramento_ CA 24 Female White Married_ spouse present 0 2000 California Sacramento_ CA 43 Male White Married_ spouse present 6000 2000 California Sacramento_ CA 2 Male White Never married/single (N/A) 2000 California Sacramento_ CA 0 Female White Never married/single (N/A) 2000 California Sacramento_ CA 37 Male Black/Negro Married_ spouse present 50020 2000 California Sacramento_ CA 35 Female Black/Negro Married_ spouse present 50020 2000 California Sacramento_ CA 50 Male White Married_ spouse present 92000 2000 California Sacramento_ CA 49 Female White Married_ spouse present 70000 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 35 Male White Married_ spouse present 15000 2000 California Sacramento_ CA 27 Female White Married_ spouse present 16000 2000 California Sacramento_ CA 38 Female Other Asian or Pacific Islander Never married/single (N/A) 30000 2000 California Sacramento_ CA 58 Male White Married_ spouse present 134000 2000 California Sacramento_ CA 53 Female Two major races Married_ spouse present 0 2000 California Sacramento_ CA 41 Male White Married_ spouse present 170000 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 48 Female Two major races Married_ spouse present 0 2000 California Sacramento_ CA 33 Female Two major races Married_ spouse present 65200 2000 California Sacramento_ CA 82 Female White Widowed 49700 2000 California Sacramento_ CA 50 Male White Married_ spouse present 79100 2000 California Sacramento_ CA 47 Female White Married_ spouse present 14200 2000 California Sacramento_ CA 30 Male Two major races Married_ spouse present 20000 2000 California Sacramento_ CA 44 Female Two major races Married_ spouse present 105200 2000 California Sacramento_ CA 41 Female White Married_ spouse present 20000 2000 California Sacramento_ CA 4 Female White Never married/single (N/A) 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 25 Female White Never married/single (N/A) 29200 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 53 Female White Married_ spouse present 60000 2000 California Sacramento_ CA 19 Female White Never married/single (N/A) 2000 2000 California Sacramento_ CA 93 Male White Divorced 22600 2000 California Sacramento_ CA 32 Male White Divorced 12000 2000 California Sacramento_ CA 50 Female White Married_ spouse present 34000 2000 California Sacramento_ CA 53 Male White Married_ spouse present 24600 2000 California Sacramento_ CA 41 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 38 Female White Married_ spouse present 21000 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 8 Male White Never married/single (N/A) 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 79 Female White Widowed 16000 2000 California Sacramento_ CA 63 Female White Widowed 206900 2000 California Sacramento_ CA 41 Male White Married_ spouse present 10000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 5600 2000 California Sacramento_ CA 34 Female White Never married/single (N/A) 24500 2000 California Sacramento_ CA 11 Male White Never married/single (N/A) 2000 California Sacramento_ CA 51 Male White Married_ spouse present 84900 2000 California Sacramento_ CA 11 Female White Never married/single (N/A) 2000 California Sacramento_ CA 66 Male White Married_ spouse present 9300 2000 California Sacramento_ CA 65 Female White Married_ spouse present 5600 2000 California Sacramento_ CA 52 Male White Married_ spouse present 60400 2000 California Sacramento_ CA 31 Male White Never married/single (N/A) 25000 2000 California Sacramento_ CA 54 Female White Divorced 25000 2000 California Sacramento_ CA 7 Male White Never married/single (N/A) 2000 California Sacramento_ CA 5 Female White Never married/single (N/A) 2000 California Sacramento_ CA 43 Female White Married_ spouse present 5000 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 44 Male White Never married/single (N/A) 20000 2000 California Sacramento_ CA 69 Female White Widowed 30400 2000 California Sacramento_ CA 52 Female White Separated 13000 2000 California Sacramento_ CA 42 Male White Married_ spouse present 81000 2000 California Sacramento_ CA 47 Female White Married_ spouse present 13400 2000 California Sacramento_ CA 12 Female White Never married/single (N/A) 2000 California Sacramento_ CA 59 Female White Married_ spouse present 30400 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 6 Male White Never married/single (N/A) 2000 California Sacramento_ CA 1 Male White Never married/single (N/A) 2000 California Sacramento_ CA 28 Female White Married_ spouse present 37600 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 0 Female White Never married/single (N/A) 2000 California Sacramento_ CA 62 Male White Married_ spouse present 115000 2000 California Sacramento_ CA 83 Female Chinese Widowed 82100 2000 California Sacramento_ CA 9 Female White Never married/single (N/A) 2000 California Sacramento_ CA 50 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 48 Male White Married_ spouse present 13600 2000 California Sacramento_ CA 23 Male White Never married/single (N/A) 900 2000 Ohio Cleveland_ OH 76 Male White Married_ spouse absent 33000 2000 Ohio Cleveland_ OH 68 Male White Married_ spouse present 41300 2000 Ohio Cleveland_ OH 46 Female White Married_ spouse present 47700 2000 Ohio Cleveland_ OH 45 Male White Widowed 6690 2000 Ohio Cleveland_ OH 48 Male White Married_ spouse present 90000 2000 Ohio Cleveland_ OH 48 Female White Married_ spouse present 21000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 81000 2000 Ohio Cleveland_ OH 50 Female White Married_ spouse present 17000 2000 Ohio Cleveland_ OH 62 Female White Married_ spouse present 2300 2000 Ohio Cleveland_ OH 30 Male White Married_ spouse present 35200 2000 Ohio Cleveland_ OH 31 Female White Married_ spouse present 24600 2000 Ohio Cleveland_ OH 5 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 49 Male White Married_ spouse present 127000 2000 Ohio Cleveland_ OH 16 Male White Never married/single (N/A) 130 2000 Ohio Cleveland_ OH 88 Female White Widowed 19900 2000 Ohio Cleveland_ OH 35 Female White Married_ spouse present 10000 2000 Ohio Cleveland_ OH 38 Male White Never married/single (N/A) 18200 2000 Ohio Cleveland_ OH 67 Female White Married_ spouse present 8400 2000 Ohio Cleveland_ OH 48 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 32 Male White Married_ spouse present 43000 2000 Ohio Cleveland_ OH 21 Female Black/Negro Never married/single (N/A) 10600 2000 Ohio Cleveland_ OH 32 Female White Married_ spouse present 27600 2000 Ohio Cleveland_ OH 12 Female Two major races Never married/single (N/A) 2000 Ohio Cleveland_ OH 7 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 38 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 11 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 8 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 42 Male White Married_ spouse present 96030 2000 Ohio Cleveland_ OH 42 Female White Married_ spouse present 28000 2000 Ohio Cleveland_ OH 25 Male White Divorced 25000 2000 Ohio Cleveland_ OH 54 Male White Married_ spouse present 31000 2000 Ohio Cleveland_ OH 52 Female White Married_ spouse present 36100 2000 Ohio Cleveland_ OH 29 Male White Separated 61000 2000 Ohio Cleveland_ OH 58 Female White Divorced 6100 2000 Ohio Cleveland_ OH 70 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 24 Female White Never married/single (N/A) 27000 2000 Ohio Cleveland_ OH 14 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 12 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 43 Female Other race_ nec Married_ spouse present 39000 2000 Ohio Cleveland_ OH 13 Male Other race_ nec Never married/single (N/A) 2000 Ohio Cleveland_ OH 20 Female Other race_ nec Never married/single (N/A) 1540 2000 Ohio Cleveland_ OH 54 Female White Divorced 55000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 38 Female White Married_ spouse present 29000 2000 Ohio Cleveland_ OH 42 Female American Indian or Alaska Native Never married/single (N/A) 14100 2000 Ohio Cleveland_ OH 58 Male White Married_ spouse present 152400 2000 Ohio Cleveland_ OH 26 Male White Married_ spouse present 36600 2000 Ohio Cleveland_ OH 8 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 65 Male White Married_ spouse present 20100 2000 Ohio Cleveland_ OH 56 Female White Married_ spouse present 300 2000 Ohio Cleveland_ OH 61 Female White Married_ spouse present 6400 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 75600 2000 Ohio Cleveland_ OH 41 Female White Married_ spouse present 490 2000 Ohio Cleveland_ OH 11 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 86 Male White Widowed 29500 2000 Ohio Cleveland_ OH 45 Male White Divorced 67900 2000 Ohio Cleveland_ OH 33 Male White Never married/single (N/A) 22000 2000 Ohio Cleveland_ OH 51 Female White Married_ spouse absent 9600 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 32 Female White Divorced 35500 2000 Ohio Cleveland_ OH 22 Female White Never married/single (N/A) 24000 2000 Ohio Cleveland_ OH 10 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 77 Male White Married_ spouse present 13210 2000 Ohio Cleveland_ OH 75 Female White Married_ spouse present 14920 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 700 2000 Ohio Cleveland_ OH 23 Female White Never married/single (N/A) 11000 2000 Ohio Cleveland_ OH 4 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 37 Male White Married_ spouse present 30100 2000 Ohio Cleveland_ OH 72 Female White Married_ spouse present 5300 2000 Ohio Cleveland_ OH 62 Female White Divorced 9000 2000 Ohio Cleveland_ OH 77 Male White Divorced 10780 2000 Ohio Cleveland_ OH 41 Male White Never married/single (N/A) 18000 2000 Ohio Cleveland_ OH 52 Female White Divorced 48700 2000 Ohio Cleveland_ OH 53 Male White Divorced 35000 2000 Ohio Cleveland_ OH 43 Male White Married_ spouse present 62000 2000 Ohio Cleveland_ OH 14 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 10 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 56 Male White Married_ spouse present 59300 2000 Ohio Cleveland_ OH 53 Female White Married_ spouse present 35000 2000 Ohio Cleveland_ OH 60 Male White Married_ spouse present 36004 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 25010 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 37700 2000 Ohio Cleveland_ OH 45 Female White Married_ spouse present 33600 2000 Ohio Cleveland_ OH 18 Male White Never married/single (N/A) 8840 2000 Ohio Cleveland_ OH 11 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 24800 2000 Ohio Cleveland_ OH 35 Male White Married_ spouse present 54450 2000 Ohio Cleveland_ OH 2 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 56 Female White Divorced 54400 2000 Ohio Cleveland_ OH 93 Female White Widowed 0 2000 Ohio Cleveland_ OH 69 Male White Widowed 47990 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 131000 2000 Ohio Cleveland_ OH 53 Female White Married_ spouse present 70000 2000 Ohio Cleveland_ OH 80 Male White Married_ spouse present 43200 2000 Ohio Cleveland_ OH 68 Female White Married_ spouse present 70800 2000 Ohio Cleveland_ OH 38 Male White Never married/single (N/A) 25000 2000 Ohio Cleveland_ OH 34 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 70 Female White Never married/single (N/A) 66700 2000 Ohio Cleveland_ OH 57 Male White Married_ spouse present 60000 2000 Ohio Cleveland_ OH 47 Female White Never married/single (N/A) 22000 2000 Ohio Cleveland_ OH 67 Female White Married_ spouse absent 28900 2000 Ohio Cleveland_ OH 35 Female White Divorced 24100 2000 Ohio Cleveland_ OH 15 Male White Never married/single (N/A) 900 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 58 Female White Widowed 48900 2000 Ohio Cleveland_ OH 72 Female White Widowed 13600 2000 Ohio Cleveland_ OH 27 Female Black/Negro Never married/single (N/A) 20700 2000 Ohio Cleveland_ OH 7 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 4 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 73 Male White Widowed 18500 2000 Ohio Cleveland_ OH 65 Male Black/Negro Married_ spouse present 21800 2000 Ohio Cleveland_ OH 66 Female Black/Negro Married_ spouse present 3600 2000 Ohio Cleveland_ OH 63 Male White Married_ spouse present 9000 2000 Ohio Cleveland_ OH 60 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 79 Male White Never married/single (N/A) 13400 2000 Ohio Cleveland_ OH 83 Female White Widowed 10400 2000 Ohio Cleveland_ OH 7 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 83 Male White Widowed 18000 2000 Ohio Cleveland_ OH 62 Male Black/Negro Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 12 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 1 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 14 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 8 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 71 Female White Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 68 Female White Widowed 13570 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 32700 2000 Ohio Cleveland_ OH 50 Female White Married_ spouse present 32600 2000 Ohio Cleveland_ OH 19 Male White Never married/single (N/A) 4560 2000 Ohio Cleveland_ OH 53 Female White Never married/single (N/A) 42000 2000 Ohio Cleveland_ OH 24 Female Black/Negro Never married/single (N/A) 2600 2000 Ohio Cleveland_ OH 2 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 77 Female White Widowed 12390 2000 Ohio Cleveland_ OH 37 Female Black/Negro Never married/single (N/A) 23200 2000 Ohio Cleveland_ OH 16 Female Black/Negro Never married/single (N/A) 500 2000 Ohio Cleveland_ OH 13 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 16 Female Black/Negro Never married/single (N/A) 500 2000 Ohio Cleveland_ OH 2 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 12 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 11 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 66 Male Black/Negro Divorced 10010 2000 Ohio Cleveland_ OH 41 Male Black/Negro Never married/single (N/A) 5600 2000 Ohio Cleveland_ OH 30 Female Black/Negro Never married/single (N/A) 12004 2000 Ohio Cleveland_ OH 42 Female White Divorced 41600 2000 Ohio Cleveland_ OH 65 Male White Divorced 44000 2000 Ohio Cleveland_ OH 47 Male White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 47 Female White Never married/single (N/A) 10600 2000 Ohio Cleveland_ OH 59 Female White Divorced 0 2000 Ohio Cleveland_ OH 86 Female Black/Negro Widowed 17860 2000 Ohio Cleveland_ OH 70 Male Black/Negro Married_ spouse present 20800 2000 Ohio Cleveland_ OH 69 Female Black/Negro Married_ spouse present 29900 2000 Ohio Cleveland_ OH 52 Female White Divorced 500 2000 Ohio Cleveland_ OH 49 Female Black/Negro Never married/single (N/A) 5500 2000 Ohio Cleveland_ OH 63 Female White Divorced 7600 2000 Ohio Cleveland_ OH 33 Male White Never married/single (N/A) 31900 2000 Ohio Cleveland_ OH 73 Female White Widowed 8700 2000 Ohio Cleveland_ OH 27 Male White Divorced 19000 2000 Ohio Cleveland_ OH 39 Male White Married_ spouse present 86000 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 14 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 37 Male Black/Negro Never married/single (N/A) 37060 2000 Ohio Cleveland_ OH 32 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 59 Female Two major races Divorced 30000 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 126000 2000 Ohio Cleveland_ OH 74 Male White Married_ spouse present 29900 2000 Ohio Cleveland_ OH 71 Female White Married_ spouse present 6500 2000 Ohio Cleveland_ OH 40 Male White Never married/single (N/A) 7800 2000 Ohio Cleveland_ OH 42 Female White Separated 19500 2000 Ohio Cleveland_ OH 79 Male White Married_ spouse present 32500 2000 Ohio Cleveland_ OH 47 Male White Married_ spouse present 35000 2000 Ohio Cleveland_ OH 49 Female White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 85 Male White Widowed 17500 2000 Ohio Cleveland_ OH 40 Female White Divorced 23200 2000 Ohio Cleveland_ OH 14 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 84 Female White Widowed 7500 2000 Ohio Cleveland_ OH 53 Female White Never married/single (N/A) 32000 2000 Ohio Cleveland_ OH 26 Male Black/Negro Married_ spouse present 18200 2000 Ohio Cleveland_ OH 1 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 75 Male White Married_ spouse present 17000 2000 Ohio Cleveland_ OH 35 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 3 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 3 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 74 Female White Married_ spouse present 27000 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 2800 2000 Ohio Cleveland_ OH 54 Female Black/Negro Divorced 11000 2000 Ohio Cleveland_ OH 58 Male Black/Negro Divorced 10000 2000 Ohio Cleveland_ OH 29 Male White Never married/single (N/A) 30300 2000 Ohio Cleveland_ OH 51 Male White Separated 40000 2000 Ohio Cleveland_ OH 63 Female White Divorced 60000 2000 Ohio Cleveland_ OH 67 Female White Widowed 9200 2000 Ohio Cleveland_ OH 69 Female Black/Negro Widowed 8500 2000 Ohio Cleveland_ OH 40 Female White Married_ spouse present 13130 2000 Ohio Cleveland_ OH 62 Male White Married_ spouse present 45500 2000 Ohio Cleveland_ OH 61 Female White Married_ spouse present 49000 2000 Ohio Cleveland_ OH 18 Male White Never married/single (N/A) 29000 2000 Ohio Cleveland_ OH 61 Male White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 38 Male White Married_ spouse present 46200 2000 Ohio Cleveland_ OH 40 Male White Married_ spouse present 112000 2000 Ohio Cleveland_ OH 34 Female White Married_ spouse present 3200 2000 Ohio Cleveland_ OH 19 Male White Never married/single (N/A) 13000 2000 Ohio Cleveland_ OH 36 Male White Married_ spouse present 53500 2000 Ohio Cleveland_ OH 60 Male White Married_ spouse present 49200 2000 Ohio Cleveland_ OH 47 Female White Married_ spouse present 25350 2000 Ohio Cleveland_ OH 26 Female White Never married/single (N/A) 29300 2000 Ohio Cleveland_ OH 51 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 34 Female White Separated 2500 2000 Ohio Cleveland_ OH 3 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 45 Male White Married_ spouse present 51910 2000 Ohio Cleveland_ OH 7 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 31 Female White Married_ spouse present 33000 2000 Ohio Cleveland_ OH 0 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 41 Female White Married_ spouse present 8000 2000 Ohio Cleveland_ OH 22 Male White Never married/single (N/A) 3200 2000 Ohio Cleveland_ OH 12 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 35 Male Other race_ nec Married_ spouse absent 13800 2000 Ohio Cleveland_ OH 11 Female Other race_ nec Never married/single (N/A) 2000 Ohio Cleveland_ OH 26 Male Other race_ nec Married_ spouse present 12000 2000 Ohio Cleveland_ OH 47 Male Other race_ nec Married_ spouse absent 12400 2000 Ohio Cleveland_ OH 62 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 63 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 17 Male White Never married/single (N/A) 830 2000 Ohio Cleveland_ OH 41 Female White Divorced 12000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 26 Male White Married_ spouse present 75400 2000 Ohio Cleveland_ OH 30 Female White Married_ spouse present 5600 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 36100 2000 Ohio Cleveland_ OH 44 Female White Married_ spouse present 30100 2000 Ohio Cleveland_ OH 22 Female White Never married/single (N/A) 18000 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 21 Female White Never married/single (N/A) 6500 2000 Ohio Cleveland_ OH 74 Male White Married_ spouse present 23100 2000 Ohio Cleveland_ OH 27 Male White Divorced 60000 2000 Ohio Cleveland_ OH 8 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 47 Male White Married_ spouse present 33600 2000 Ohio Cleveland_ OH 72 Female White Widowed 21000 2000 Ohio Cleveland_ OH 61 Male White Married_ spouse present 143230 2000 Ohio Cleveland_ OH 16 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 43 Male White Married_ spouse present 116390 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 23000 2000 Ohio Cleveland_ OH 10 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 6 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 62 Male White Widowed 420 2000 Ohio Cleveland_ OH 53 Male White Married_ spouse present 70000 2000 Ohio Cleveland_ OH 52 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 50 Male White Divorced 16500 2000 Ohio Cleveland_ OH 59 Female White Divorced 27200 2000 Ohio Cleveland_ OH 40 Male White Married_ spouse present 23700 2000 Ohio Cleveland_ OH 40 Female White Married_ spouse present 19800 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 32 Female White Married_ spouse present 25700 2000 Ohio Cleveland_ OH 4 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 34 Male White Never married/single (N/A) 36000 2000 Ohio Cleveland_ OH 73 Female White Widowed 25820 2000 Ohio Cleveland_ OH 59 Male White Married_ spouse present 34000 2000 Ohio Cleveland_ OH 60 Female White Married_ spouse present 21000 \ No newline at end of file diff --git a/docs/previous_versions/v0.4.0/data/dem_score.csv b/docs/previous_versions/v0.4.0/data/dem_score.csv new file mode 100755 index 000000000..c48fc1f49 --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/dem_score.csv @@ -0,0 +1,97 @@ +country,1952,1957,1962,1967,1972,1977,1982,1987,1992 +Albania,-9,-9,-9,-9,-9,-9,-9,-9,5 +Argentina,-9,-1,-1,-9,-9,-9,-8,8,7 +Armenia,-9,-7,-7,-7,-7,-7,-7,-7,7 +Australia,10,10,10,10,10,10,10,10,10 +Austria,10,10,10,10,10,10,10,10,10 +Azerbaijan,-9,-7,-7,-7,-7,-7,-7,-7,1 +Belarus,-9,-7,-7,-7,-7,-7,-7,-7,7 +Belgium,10,10,10,10,10,10,10,10,10 +Bhutan,-10,-10,-10,-10,-10,-10,-10,-10,-10 +Bolivia,-4,-3,-3,-4,-7,-7,8,9,9 +Brazil,5,5,5,-9,-9,-4,-3,7,8 +Bulgaria,-7,-7,-7,-7,-7,-7,-7,-7,8 +Canada,10,10,10,10,10,10,10,10,10 +Chile,2,5,5,6,6,-7,-7,-6,8 +China,-8,-8,-8,-9,-8,-7,-7,-7,-7 +Colombia,-5,7,7,7,7,8,8,8,9 +Costa Rica,10,10,10,10,10,10,10,10,10 +Croatia,-7,-7,-7,-7,-7,-7,-5,-5,-3 +Cuba,0,-9,-7,-7,-7,-7,-7,-7,-7 +Czech Rep.,-7,-7,-7,-7,-7,-7,-7,-7,8 +Denmark,10,10,10,10,10,10,10,10,10 +Dominican Rep.,-9,-9,8,-3,-3,-3,6,6,6 +Ecuador,2,2,-1,-1,-5,-5,9,8,9 +Egypt,-7,-7,-7,-7,-7,-6,-6,-6,-6 +El Salvador,-6,-5,-3,0,-1,-6,2,6,7 +Estonia,-9,-7,-7,-7,-7,-7,-7,-7,6 +Ethiopia,-9,-9,-9,-9,-9,-7,-7,-8,0 +Finland,10,10,10,10,10,10,10,10,10 +France,10,10,5,5,8,8,8,9,9 +Georgia,-9,-7,-7,-7,-7,-7,-7,-7,4 +Germany,10,10,10,10,10,10,10,10,10 +Greece,4,4,4,-7,-7,8,8,10,10 +Guatemala,2,-6,-5,3,1,-3,-7,3,3 +Haiti,-5,-5,-9,-9,-10,-9,-9,-8,-7 +Honduras,-3,-1,-1,-1,-1,-1,6,5,6 +Hungary,-7,-7,-7,-7,-7,-7,-7,-7,10 +India,9,9,9,9,9,8,8,8,8 +Indonesia,0,-1,-5,-7,-7,-7,-7,-7,-7 +Iran,-1,-10,-10,-10,-10,-10,-6,-6,-6 +Iraq,-4,-4,-5,-5,-7,-7,-9,-9,-9 +Ireland,10,10,10,10,10,10,10,10,10 +Israel,10,10,10,9,9,9,9,9,9 +Italy,10,10,10,10,10,10,10,10,10 +Japan,10,10,10,10,10,10,10,10,10 +Jordan,-1,-9,-9,-9,-9,-10,-10,-9,-2 +Kazakhstan,-9,-7,-7,-7,-7,-7,-7,-7,-3 +"Korea, Dem. Rep.",-7,-8,-8,-9,-9,-9,-9,-9,-9 +"Korea, Rep.",-4,-4,-7,3,-9,-8,-5,1,6 +Kyrgyzstan,-9,-7,-7,-7,-7,-7,-7,-7,-3 +Latvia,-9,-7,-7,-7,-7,-7,-7,-7,8 +Lebanon,2,2,2,2,5,0,0,0,0 +Liberia,-6,-6,-6,-6,-6,-6,-7,-6,0 +Libya,-7,-7,-7,-7,-7,-7,-7,-7,-7 +Lithuania,-9,-7,-7,-7,-7,-7,-7,-7,10 +"Macedonia, FYR",-7,-7,-7,-7,-7,-7,-5,-5,6 +Mexico,-6,-6,-6,-6,-6,-3,-3,-3,0 +Moldova,-9,-7,-7,-7,-7,-7,-7,-7,5 +Mongolia,-7,-7,-7,-7,-7,-7,-7,-7,9 +Montenegro,-7,-7,-7,-7,-7,-7,-5,-5,-5 +Myanmar,8,8,-6,-7,-7,-6,-8,-8,-7 +Nepal,-7,-4,-9,-9,-9,-9,-2,-2,5 +Netherlands,10,10,10,10,10,10,10,10,10 +New Zealand,10,10,10,10,10,10,10,10,10 +Nicaragua,-8,-8,-8,-8,-8,-8,-5,-1,6 +Norway,10,10,10,10,10,10,10,10,10 +Oman,-6,-10,-10,-10,-10,-10,-10,-10,-9 +Pakistan,5,8,1,1,4,-7,-7,-4,8 +Panama,-1,4,4,4,-7,-7,-5,-8,8 +Paraguay,-5,-9,-9,-8,-8,-8,-8,-8,7 +Peru,-2,5,-6,5,-7,-7,7,7,-3 +Philippines,5,5,5,5,-9,-9,-7,8,8 +Poland,-7,-7,-7,-7,-7,-7,-8,-6,8 +Portugal,-9,-9,-9,-9,-9,9,10,10,10 +Romania,-7,-7,-7,-7,-7,-8,-8,-8,5 +Russia,-9,-7,-7,-7,-7,-7,-7,-7,5 +Saudi Arabia,-10,-10,-10,-10,-10,-10,-10,-10,-10 +Serbia,-7,-7,-7,-7,-7,-7,-5,-5,-5 +Slovak Republic,-7,-7,-7,-7,-7,-7,-7,-7,8 +Slovenia,-7,-7,-7,-7,-7,-7,-5,-5,10 +South Africa,4,4,4,4,4,4,4,4,6 +Spain,-7,-7,-7,-7,-7,5,10,10,10 +Sri Lanka,7,7,7,7,8,8,5,5,5 +Sweden,10,10,10,10,10,10,10,10,10 +Switzerland,10,10,10,10,10,10,10,10,10 +Syria,-7,7,-2,-7,-9,-9,-9,-9,-9 +Taiwan,-8,-8,-8,-8,-8,-7,-7,-1,7 +Tajikistan,-9,-7,-7,-7,-7,-7,-7,-7,-6 +Thailand,-6,-3,-7,-7,-7,-2,2,2,9 +Turkey,7,4,9,8,-2,9,-5,7,9 +Turkmenistan,-9,-7,-7,-7,-7,-7,-7,-7,-9 +Ukraine,-9,-7,-7,-7,-7,-7,-7,-7,6 +United Kingdom,10,10,10,10,10,10,10,10,10 +United States,10,10,10,10,10,10,10,10,10 +Uruguay,8,8,8,8,-3,-8,-7,9,10 +Uzbekistan,-9,-7,-7,-7,-7,-7,-7,-7,-9 +Venezuela,-3,-3,6,6,9,9,9,9,8 diff --git a/docs/previous_versions/v0.4.0/data/dem_score.xlsx b/docs/previous_versions/v0.4.0/data/dem_score.xlsx new file mode 100755 index 000000000..85d90daa9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/data/dem_score.xlsx differ diff --git a/docs/previous_versions/v0.4.0/data/ideology.csv b/docs/previous_versions/v0.4.0/data/ideology.csv new file mode 100755 index 000000000..302957298 --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/ideology.csv @@ -0,0 +1,76 @@ +city,state,state_ideology +New York,New York,Liberal +Chicago,Illinois,Liberal +Los Angeles,California,Liberal +Washington,DC,Liberal +Houston,Texas,Conservative +Philadelphia,Pennsylvania,Conservative +Phoenix,Arizona,Conservative +San Diego,California,Liberal +Dallas,Texas,Conservative +Detroit,Michigan,Conservative +San Francisco,California,Liberal +San Antonio,Texas,Conservative +Atlanta,Georgia,Conservative +Las Vegas,Nevada,Liberal +Baltimore,Maryland,Liberal +Boston,Massachusetts,Liberal +"Jacksonville, Fla.",Florida,Conservative +"El Paso, Texas",Texas,Conservative +"Columbus, Ohio",Ohio,Conservative +Cleveland,Ohio,Conservative +"Tucson, Ariz.",Arizona,Conservative +"Newark, N.J.",New Jersey,Liberal +"Austin, Texas",Texas,Conservative +"Memphis, Tenn.",Tennessee,Conservative +Milwaukee,Wisconsin,Conservative +"San Jose, Calif.",California,Liberal +Miami,Florida,Conservative +Denver,Colorado,Liberal +"Sacramento, Calif.",California,Liberal +"Charlotte, N.C.",North Carolina,Conservative +"Tampa, Fla.",Florida,Conservative +Indianapolis,Indiana,Conservative +"Santa Ana, Calif.",California,Liberal +New Orleans,Louisiana,Conservative +"Oakland, Calif.",California,Liberal +"Orlando, Fla.",Florida,Conservative +"Oklahoma City, Okla.",Oklahoma,Conservative +Seattle,Washington,Liberal +"Kansas City, Mo.",Missouri,Conservative +"Nashville, Tenn.",Tennessee,Conservative +"Laredo, Texas",Texas,Conservative +"Fort Worth, Texas",Texas,Conservative +"Louisville, Ky.",Kentucky,Conservative +"Norfolk, Va.",Virginia,Liberal +"Arlington, Va.",Virginia,Liberal +Pittsburgh,Pennsylvania,Conservative +"Albuquerque, N.M.",New Mexico,Liberal +"Jersey City, N.J.",New Jersey,Liberal +"Raleigh, N.C.",North Carolina,Conservative +"Rochester, N.Y.",New York,Liberal +Cincinnati,Ohio,Conservative +"Long Beach, Calif.",California,Liberal +"Birmingham, Ala.",Alabama,Conservative +"Wichita, Kan.",Kansas,Conservative +"Virginia Beach, Va.",Virginia,Liberal +"Fresno, Calif.",California,Liberal +"Buffalo, N.Y.",New York,Liberal +Minneapolis,Minneapolis,Liberal +"Portland, Ore.",Oregon,Liberal +"Reno, Nev.",Nevada,Liberal +"Richmond, Va.",Virginia,Liberal +"Baton Rouge, La.",Louisiana,Conservative +"Jackson, Miss.",Mississippi,Conservative +"Riverside, Calif.",California,Liberal +"Fort Lauderdale, Fla.",Florida,Conservative +St. Louis,Missouri,Conservative +"Brownsville, Texas",Texas,Conservative +"Albany, N.Y.",New York,Liberal +"Colorado Springs, Colo.",Colorado,Liberal +"Savannah, Ga.",Georgia,Conservative +"Winston-Salem, N.C.",North Carolina,Conservative +"Toledo, Ohio",Ohio,Conservative +"Madison, Wis.",Wisconsin,Conservative +"Corpus Christi, Texas",Texas,Conservative +"San Bernardino, Calif.",California,Liberal \ No newline at end of file diff --git a/docs/previous_versions/v0.4.0/data/le_mess.csv b/docs/previous_versions/v0.4.0/data/le_mess.csv new file mode 100755 index 000000000..7cc6fb6fc --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/le_mess.csv @@ -0,0 +1,203 @@ +country,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016 +Afghanistan,27.13,27.67,28.19,28.73,29.27,29.8,30.34,30.86,31.4,31.94,32.47,33.01,33.53,34.07,34.6,35.13,35.66,36.17,36.69,37.2,37.7,38.19,38.67,39.14,39.61,40.07,40.53,40.98,41.46,41.96,42.51,43.11,43.75,44.45,45.21,46.02,46.87,47.74,48.62,49.5,49.3,49.4,49.5,48.9,49.4,49.7,49.5,48.6,50.0,50.1,50.4,51.0,51.4,51.8,52.0,52.1,52.4,52.8,53.3,53.6,54.0,54.4,54.8,54.9,53.8,52.72 +Albania,54.72,55.23,55.85,56.59,57.45,58.42,59.48,60.6,61.75,62.87,63.92,64.84,65.6,66.18,66.59,66.88,67.11,67.32,67.55,67.83,68.16,68.53,68.93,69.35,69.77,70.17,70.54,70.86,71.14,71.39,71.63,71.88,72.15,72.42,72.71,72.96,73.14,73.25,73.3,73.3,73.4,73.6,73.6,73.6,73.7,73.8,74.1,74.2,74.2,74.7,75.1,75.5,75.7,75.9,76.2,76.4,76.6,76.8,77.0,77.2,77.4,77.5,77.7,77.9,78.0,78.1 +Algeria,43.03,43.5,43.96,44.44,44.93,45.44,45.94,46.45,46.97,47.5,48.02,48.55,49.07,49.58,50.09,50.58,51.05,51.49,51.95,52.41,52.88,53.38,53.91,54.52,55.24,56.11,57.13,58.28,59.56,60.92,62.31,63.69,64.97,66.15,67.18,68.04,68.75,69.33,69.81,70.2,70.5,70.9,71.2,71.4,71.6,72.1,72.4,72.6,73.0,73.3,73.5,73.8,73.9,74.4,74.8,75.0,75.3,75.5,75.7,76.0,76.1,76.2,76.3,76.3,76.4,76.5 +Angola,31.05,31.59,32.14,32.69,33.24,33.78,34.33,34.88,35.43,35.98,36.53,37.08,37.63,38.18,38.74,39.28,39.84,40.39,40.95,41.5,42.06,42.62,43.17,43.71,44.22,44.68,45.12,45.5,45.84,46.14,46.42,46.69,46.96,47.23,47.5,47.75,47.99,48.2,48.4,48.6,49.3,49.6,48.4,50.0,50.9,51.3,51.7,51.8,51.8,52.3,52.5,53.3,53.9,54.5,55.2,55.7,56.2,56.7,57.1,57.6,58.1,58.5,58.8,59.2,59.6,60.0 +Antigua and Barbuda,58.26,58.8,59.34,59.87,60.41,60.93,61.45,61.97,62.48,62.97,63.46,63.93,64.38,64.81,65.23,65.63,66.03,66.41,66.81,67.19,67.56,67.94,68.3,68.64,68.99,69.32,69.64,69.96,70.28,70.59,70.9,71.22,71.52,71.82,72.13,72.42,72.7,72.97,73.24,73.5,73.6,73.5,73.4,73.4,73.5,73.5,73.9,74.1,74.0,73.8,74.1,74.3,74.5,74.6,74.9,74.9,75.3,75.5,75.7,75.8,75.9,76.1,76.2,76.3,76.4,76.5 +Argentina,61.93,62.54,63.1,63.59,64.03,64.41,64.73,65.0,65.22,65.39,65.53,65.64,65.74,65.84,65.95,66.08,66.26,66.47,66.72,67.01,67.32,67.64,67.96,68.28,68.6,68.92,69.24,69.57,69.89,70.2,70.51,70.78,71.04,71.26,71.46,71.66,71.84,72.05,72.26,72.5,72.7,72.8,73.1,73.4,73.5,73.5,73.6,73.8,73.9,74.2,74.3,74.3,74.5,75.0,75.3,75.3,75.2,75.4,75.6,75.8,76.0,76.1,76.2,76.3,76.5,76.7 +Armenia,62.67,63.13,63.6,64.07,64.54,65.0,65.45,65.92,66.39,66.86,67.33,67.82,68.3,68.78,69.26,69.74,70.22,70.67,71.1,71.47,71.79,72.02,72.19,72.28,72.33,72.38,72.44,72.53,72.63,72.72,72.73,72.64,72.43,72.1,71.7,71.24,70.82,70.46,70.22,70.1,69.7,68.8,68.3,68.6,69.1,69.4,70.0,70.5,70.8,71.3,71.4,71.6,71.5,71.8,71.8,71.7,72.3,72.3,72.6,73.0,73.5,73.9,74.3,74.5,74.7,74.9 +Aruba,58.96,60.01,60.98,61.87,62.69,63.42,64.09,64.68,65.2,65.66,66.07,66.44,66.79,67.11,67.44,67.76,68.1,68.44,68.78,69.14,69.5,69.85,70.19,70.52,70.83,71.14,71.44,71.74,72.02,72.29,72.54,72.75,72.93,73.07,73.18,73.26,73.33,73.38,73.43,73.47,73.51,73.54,73.57,73.6,73.62,73.65,73.67,73.7,73.73,73.78,73.85,73.94,74.05,74.18,74.32,74.47,74.62,74.77,74.92,75.06,75.19,75.32,75.46,75.59,75.72,75.85 +Australia,68.71,69.11,69.69,69.84,70.16,70.03,70.31,70.86,70.43,70.87,71.14,70.91,70.97,70.63,70.96,70.79,71.07,70.7,71.11,70.78,71.38,71.9,72.11,71.86,72.81,72.84,73.45,73.84,74.4,74.56,74.92,74.7,75.51,75.98,75.41,76.08,76.27,76.3,76.4,77.0,77.4,77.6,77.9,78.1,78.3,78.5,78.8,79.2,79.4,79.8,80.1,80.3,80.6,80.9,81.2,81.4,81.5,81.6,81.8,82.0,82.2,82.4,82.4,82.3,82.3,82.3 +Austria,65.24,66.78,67.27,67.3,67.58,67.7,67.46,68.46,68.39,68.75,69.72,69.51,69.64,70.13,69.92,70.22,70.1,70.25,70.02,70.07,70.27,70.59,71.16,71.15,71.28,71.77,72.12,72.2,72.51,72.64,72.96,73.12,73.19,73.73,73.95,74.43,74.86,75.34,75.43,75.7,75.8,76.0,76.2,76.5,76.8,77.1,77.6,77.8,78.0,78.2,78.6,78.8,79.0,79.4,79.5,80.0,80.1,80.4,80.3,80.5,80.7,80.9,81.1,81.2,81.3,81.4 +Azerbaijan,57.5,57.93,58.36,58.79,59.21,59.63,60.05,60.48,60.9,61.33,61.76,62.2,62.62,63.06,63.49,63.91,64.35,64.75,65.14,65.48,65.75,65.93,66.04,66.05,66.02,65.92,65.8,65.68,65.6,65.55,65.61,65.73,65.92,66.15,66.37,66.48,66.46,66.28,65.98,65.6,65.3,63.7,64.0,63.5,64.6,65.0,65.3,65.6,65.9,66.5,67.2,67.6,67.6,67.8,68.2,68.7,69.1,69.2,69.7,70.1,70.8,71.5,72.1,72.5,72.9,73.3 +Bahamas,58.91,59.29,59.67,60.03,60.39,60.72,61.06,61.38,61.69,62.0,62.29,62.58,62.85,63.13,63.4,63.65,63.91,64.14,64.39,64.61,64.85,65.08,65.3,65.53,65.74,65.96,66.16,66.37,66.57,66.75,66.95,67.12,67.31,67.5,67.67,67.86,68.02,68.2,68.35,68.5,68.9,69.2,69.7,69.5,69.7,70.0,70.2,70.1,70.1,70.2,70.3,70.4,71.1,71.7,71.7,72.0,71.8,72.2,72.7,72.7,72.6,72.7,72.9,73.5,73.7,73.9 +Bahrain,41.45,42.32,43.26,44.27,45.35,46.49,47.7,48.97,50.29,51.64,52.99,54.33,55.64,56.9,58.1,59.23,60.29,61.29,62.22,63.1,63.92,64.67,65.38,66.03,66.63,67.2,67.72,68.21,68.67,69.09,69.47,69.83,70.16,70.46,70.73,70.98,71.2,71.41,71.61,71.8,72.0,72.1,72.5,72.9,73.0,73.4,73.8,74.0,74.2,73.7,74.3,74.8,75.3,75.7,76.1,76.3,77.0,77.6,78.2,78.7,78.8,79.0,79.1,79.1,79.1,79.1 +Bangladesh,42.58,42.87,43.19,43.54,43.91,44.3,44.73,45.19,45.68,46.2,46.73,47.28,47.81,48.29,48.6,48.63,48.37,47.83,47.09,46.31,45.74,45.52,45.77,46.49,47.58,48.92,50.27,51.47,52.44,53.18,53.72,54.15,54.57,55.0,55.47,55.96,56.46,56.94,57.42,57.9,56.4,59.7,60.5,61.2,61.6,62.4,63.2,63.9,64.6,64.9,65.4,65.8,66.3,66.8,67.1,67.5,67.7,68.3,68.6,68.8,69.3,69.4,69.8,70.1,70.4,70.7 +Barbados,56.82,57.41,57.99,58.56,59.13,59.67,60.22,60.76,61.28,61.8,62.31,62.79,63.27,63.74,64.2,64.64,65.08,65.5,65.91,66.31,66.71,67.09,67.47,67.83,68.17,68.53,68.87,69.22,69.57,69.91,70.25,70.58,70.91,71.23,71.54,71.85,72.14,72.43,72.72,73.0,73.2,73.2,73.1,73.0,73.3,73.7,73.9,74.1,74.2,74.0,74.4,74.6,74.8,74.9,75.0,75.0,75.1,75.3,75.3,75.2,75.2,75.4,75.5,75.6,75.7,75.8 +Belarus,65.11,65.54,65.96,66.37,66.77,67.16,67.52,67.88,68.82,71.59,72.3,71.01,71.66,73.17,72.7,73.05,72.78,72.88,72.47,71.94,72.56,72.26,72.29,72.57,71.63,71.46,71.39,71.23,70.82,70.57,70.84,70.95,70.73,70.09,70.28,71.66,71.55,71.28,71.05,70.5,70.1,69.6,68.9,68.6,68.2,68.1,68.0,67.9,67.7,68.1,68.0,67.9,68.2,68.5,68.7,69.1,69.7,70.0,70.1,70.2,70.3,70.4,70.6,70.7,71.0,71.3 +Belgium,66.77,67.97,68.33,68.59,68.54,68.83,69.19,69.88,70.28,69.59,70.46,70.19,70.0,70.66,70.51,70.58,70.86,70.55,70.63,70.89,71.01,71.35,71.56,71.91,71.9,72.05,72.7,72.64,73.13,73.18,73.59,73.81,73.81,74.31,74.41,74.61,75.22,75.53,75.59,76.0,76.2,76.3,76.5,76.6,76.9,77.2,77.4,77.5,77.7,77.8,78.0,78.2,78.5,79.0,79.1,79.5,79.5,79.6,79.8,80.1,80.2,80.3,80.4,80.5,80.5,80.5 +Belize,55.15,55.7,56.27,56.82,57.37,57.91,58.46,58.99,59.54,60.08,60.64,61.2,61.78,62.36,62.95,63.53,64.11,64.67,65.21,65.72,66.21,66.66,67.11,67.52,67.93,68.32,68.7,69.06,69.43,69.78,70.13,70.47,70.8,71.09,71.34,71.51,71.6,71.61,71.54,71.4,71.2,71.1,70.8,70.6,70.5,70.4,69.7,69.5,69.3,69.0,68.8,69.3,69.6,69.9,70.0,70.3,70.6,70.7,70.9,71.2,71.2,71.3,71.3,71.5,71.7,71.9 +Benin,33.53,34.09,34.64,35.19,35.72,36.25,36.77,37.28,37.79,38.29,38.8,39.32,39.85,40.38,40.93,41.5,42.09,42.69,43.31,43.93,44.55,45.16,45.77,46.36,46.93,47.46,47.96,48.43,48.88,49.34,49.84,50.38,50.97,51.62,52.33,53.09,53.89,54.67,55.42,56.1,56.3,56.6,56.9,56.8,56.7,56.6,56.9,57.0,57.1,57.2,57.4,57.7,57.9,58.2,58.6,58.9,59.2,59.7,60.4,60.8,61.1,61.4,61.7,62.0,62.3,62.6 +Bhutan,30.94,31.47,32.01,32.56,33.12,33.68,34.25,34.81,35.38,35.94,36.49,37.04,37.57,38.12,38.68,39.28,39.94,40.66,41.45,42.31,43.23,44.2,45.18,46.2,47.21,48.22,49.22,50.21,51.18,52.12,53.05,53.96,54.87,55.78,56.69,57.61,58.54,59.48,60.44,61.4,61.9,62.4,62.8,63.1,63.8,64.7,65.1,65.6,66.5,65.9,67.5,68.1,68.5,68.9,69.3,69.8,70.3,70.7,70.9,71.4,71.7,71.9,72.2,72.4,72.7,73.0 +Bolivia,40.6,40.94,41.28,41.64,41.98,42.34,42.7,43.05,43.41,43.77,44.14,44.5,44.88,45.24,45.62,45.99,46.34,46.69,47.05,47.44,47.86,48.34,48.89,49.5,50.19,50.93,51.73,52.54,53.38,54.21,55.04,55.87,56.67,57.47,58.22,58.96,59.65,60.33,60.98,61.6,62.2,62.7,63.2,63.8,64.4,65.1,65.6,66.3,66.9,67.6,68.3,68.7,69.3,69.8,70.2,70.6,70.9,71.2,71.6,71.8,72.1,72.4,72.7,72.9,73.2,73.5 +Bosnia and Herzegovina,53.22,54.49,55.7,56.85,57.94,58.97,59.95,60.87,61.74,62.56,63.34,64.07,64.78,65.46,66.14,66.81,67.47,68.14,68.82,69.49,70.17,70.84,71.49,72.12,72.71,73.24,73.71,74.12,74.48,74.82,75.2,75.65,76.15,76.63,76.95,76.89,76.37,75.39,74.07,72.7,72.7,68.0,68.3,71.1,67.0,73.8,74.4,74.8,75.3,75.7,76.2,76.4,76.7,76.9,77.0,77.1,77.3,77.5,77.7,77.9,78.2,78.4,78.6,78.7,78.9,79.1 +Botswana,46.87,47.27,47.66,48.05,48.45,48.84,49.23,49.61,49.99,50.34,50.7,51.02,51.35,51.67,52.0,52.36,52.77,53.23,53.73,54.3,54.9,55.54,56.18,56.82,57.45,58.07,58.65,59.21,59.74,60.24,60.73,61.21,61.67,62.08,62.44,62.7,62.85,62.85,62.69,62.3,62.0,61.2,60.1,58.6,56.8,54.8,52.9,50.9,49.2,47.6,46.5,45.6,45.7,46.9,49.3,51.2,52.4,53.2,54.3,55.6,56.5,56.5,56.9,57.3,58.7,60.13 +Brazil,50.59,51.1,51.62,52.14,52.66,53.19,53.71,54.23,54.75,55.27,55.78,56.27,56.75,57.21,57.66,58.07,58.49,58.91,59.31,59.73,60.14,60.56,60.98,61.41,61.84,62.27,62.68,63.07,63.45,63.81,64.18,64.55,64.94,65.34,65.76,66.18,66.6,67.04,67.47,67.9,68.1,68.3,68.5,68.8,69.0,69.3,69.6,69.9,70.3,70.7,71.1,71.4,71.7,72.0,72.4,72.7,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.3,74.4,74.5 +Brunei,56.99,57.6,58.22,58.83,59.45,60.07,60.7,61.31,61.93,62.52,63.11,63.67,64.21,64.72,65.21,65.67,66.12,66.54,66.97,67.38,67.79,68.19,68.58,68.95,69.32,69.67,70.01,70.33,70.65,70.95,71.25,71.54,71.84,72.12,72.41,72.69,72.98,73.26,73.54,73.8,73.8,74.0,74.2,74.4,74.7,74.9,75.2,75.6,75.8,75.9,76.1,76.3,76.5,76.7,76.7,76.8,76.8,76.9,77.0,77.1,76.9,76.9,76.9,77.1,77.1,77.1 +Bulgaria,60.65,59.62,64.16,64.43,64.84,65.24,66.64,68.74,66.6,69.22,70.26,69.55,70.38,71.18,71.35,71.28,70.47,71.3,70.48,71.32,70.93,70.96,71.4,71.26,71.11,71.44,70.88,71.24,71.34,71.17,71.56,71.16,71.33,71.43,71.15,71.63,71.42,71.49,71.55,71.4,71.3,71.2,71.1,70.9,71.0,70.9,70.6,71.0,71.4,71.6,71.8,72.1,72.3,72.5,72.6,72.7,72.9,73.2,73.5,73.7,74.2,74.5,74.6,74.7,74.8,74.9 +Burkina Faso,30.65,31.18,31.69,32.21,32.72,33.21,33.71,34.21,34.71,35.21,35.72,36.23,36.75,37.27,37.8,38.3,38.8,39.3,39.78,40.27,40.75,41.25,41.78,42.36,43.0,43.74,44.61,45.56,46.58,47.61,48.58,49.45,50.17,50.71,51.08,51.28,51.38,51.42,51.42,51.4,51.4,51.3,51.3,51.3,51.3,51.5,51.6,51.8,52.2,52.6,53.2,53.8,54.5,55.1,55.9,56.6,57.4,58.0,58.5,59.0,59.5,59.9,60.3,60.6,60.9,61.2 +Burundi,38.19,38.45,38.72,38.98,39.25,39.51,39.77,40.04,40.3,40.58,40.85,41.13,41.41,41.69,41.95,42.18,42.37,42.52,42.65,42.76,42.91,43.1,43.35,43.66,44.02,44.43,44.84,45.24,45.6,45.93,46.22,46.49,46.75,46.95,47.08,47.05,46.88,46.54,46.1,45.6,45.4,45.3,45.1,45.0,44.5,44.3,45.0,45.5,46.3,46.7,48.4,49.8,51.3,53.0,54.7,56.4,57.9,59.1,60.0,60.4,60.8,61.1,61.3,61.4,61.4,61.4 +Cambodia,40.5,40.81,41.08,41.32,41.52,41.7,41.86,41.99,42.14,42.29,42.47,42.7,42.95,43.2,43.45,43.73,44.0,44.13,44.03,43.28,41.67,39.73,37.58,34.94,21.69,19.04,18.1,19.55,21.91,28.16,38.0,44.24,49.43,53.22,55.5,56.49,56.82,56.99,57.22,57.6,57.9,58.2,58.1,58.0,58.1,58.3,58.7,59.0,59.5,60.0,60.8,61.6,62.4,63.2,64.0,64.8,65.4,66.1,66.6,67.0,67.6,68.2,68.7,69.1,69.4,69.7 +Cameroon,39.08,39.51,39.94,40.41,40.87,41.37,41.88,42.39,42.93,43.46,44.0,44.53,45.07,45.59,46.13,46.67,47.22,47.79,48.37,48.97,49.59,50.22,50.85,51.49,52.13,52.74,53.36,53.95,54.52,55.06,55.56,56.03,56.45,56.83,57.17,57.48,57.75,58.01,58.22,58.4,58.2,57.9,57.4,57.0,56.5,56.2,55.5,55.0,54.7,54.3,54.2,54.2,54.3,54.4,54.9,55.4,55.7,56.6,57.3,57.8,58.1,58.5,59.0,59.1,59.4,59.7 +Canada,68.53,68.72,69.1,69.96,70.02,70.0,69.92,70.58,70.62,71.0,71.22,71.25,71.26,71.64,71.74,71.86,72.07,72.23,72.39,72.58,72.91,72.81,73.04,73.12,73.41,73.84,74.13,74.46,74.81,75.05,75.46,75.67,76.04,76.33,76.31,76.46,76.76,76.82,77.09,77.4,77.6,77.7,77.8,77.9,78.0,78.3,78.6,78.8,79.0,79.2,79.5,79.6,79.8,80.1,80.2,80.5,80.6,80.8,81.1,81.3,81.6,81.6,81.6,81.7,81.7,81.7 +Cape Verde,48.45,48.63,48.81,49.0,49.19,49.38,49.57,49.76,49.95,50.12,50.27,50.43,50.59,50.77,51.0,51.32,51.75,52.32,53.0,53.78,54.65,55.57,56.5,57.41,58.3,59.16,60.0,60.82,61.62,62.41,63.19,63.95,64.69,65.43,66.12,66.75,67.33,67.85,68.3,68.7,68.6,68.6,68.4,68.3,68.3,68.2,68.2,68.2,68.2,68.4,68.6,68.7,68.9,69.1,69.3,69.6,69.6,70.4,70.7,71.1,71.4,71.9,72.3,72.7,72.9,73.1 +Central African Republic,33.34,33.79,34.26,34.72,35.18,35.62,36.07,36.53,36.97,37.43,37.89,38.36,38.85,39.36,39.92,40.5,41.15,41.84,42.57,43.36,44.19,45.04,45.91,46.77,47.6,48.36,49.07,49.7,50.21,50.61,50.86,50.96,50.95,50.81,50.57,50.21,49.8,49.34,48.86,48.4,48.1,48.0,47.5,47.2,46.7,46.3,45.9,45.7,45.5,45.3,45.2,45.2,45.2,45.4,45.5,45.8,46.2,46.8,47.6,47.9,48.1,48.5,47.8,48.2,49.6,51.04 +Chad,37.29,37.69,38.09,38.49,38.9,39.31,39.72,40.14,40.54,40.95,41.35,41.76,42.17,42.58,43.01,43.48,43.98,44.54,45.12,45.72,46.33,46.91,47.47,47.98,48.45,48.89,49.31,49.72,50.14,50.56,50.97,51.38,51.78,52.15,52.51,52.81,53.09,53.33,53.52,53.7,54.3,53.9,54.0,53.6,53.6,53.0,52.5,52.1,51.7,51.5,51.7,51.9,52.1,52.6,53.0,53.1,54.0,54.3,55.2,55.8,56.1,56.3,56.6,56.8,57.4,58.01 +Channel Islands,68.71,69.09,69.43,69.72,69.97,70.19,70.37,70.52,70.64,70.74,70.83,70.93,71.03,71.14,71.27,71.39,71.51,71.62,71.73,71.82,71.92,72.02,72.13,72.26,72.41,72.58,72.77,72.98,73.21,73.44,73.67,73.89,74.1,74.3,74.49,74.68,74.87,75.07,75.29,75.51,75.73,75.94,76.14,76.34,76.53,76.72,76.92,77.14,77.37,77.61,77.87,78.14,78.41,78.67,78.93,79.16,79.38,79.57,79.75,79.9,80.05,80.19,80.32,80.47,80.61,80.75 +Chile,54.35,54.56,54.79,55.03,55.29,55.57,55.86,56.16,56.5,56.85,57.23,57.63,58.07,58.54,59.03,59.54,60.07,60.61,61.17,61.74,62.34,62.98,63.63,64.31,65.02,65.75,66.5,67.25,67.99,68.7,69.36,69.97,70.51,71.0,71.42,71.8,72.14,72.47,72.79,73.1,74.1,75.0,75.2,75.3,75.4,75.7,76.2,76.6,76.9,77.3,77.4,77.7,77.8,78.0,78.2,78.2,78.3,78.5,78.5,78.5,78.9,79.1,79.1,79.2,79.4,79.6 +China,41.98,42.91,43.85,45.7,47.2,49.57,49.62,49.17,37.36,30.53,32.95,43.29,50.64,52.0,54.28,55.37,56.9,57.87,59.38,61.0,62.04,61.36,60.97,60.63,60.78,60.46,61.94,62.15,62.95,63.92,64.2,65.28,65.49,65.68,65.87,66.05,66.23,66.39,66.56,66.7,67.0,67.2,67.5,67.9,68.4,68.8,69.1,69.4,69.6,69.8,70.0,70.2,70.9,71.4,71.9,72.6,73.1,73.4,73.9,74.3,74.9,75.3,75.7,75.9,76.2,76.5 +Colombia,49.7,50.93,52.08,53.16,54.15,55.07,55.91,56.69,57.39,58.03,58.63,59.18,59.71,60.21,60.7,61.16,61.6,62.03,62.43,62.83,63.23,63.64,64.08,64.53,65.04,65.58,66.17,66.79,67.43,68.07,68.67,69.24,69.72,70.13,70.48,70.74,70.96,71.14,71.32,71.5,71.1,71.1,71.4,71.6,72.0,72.2,72.8,73.1,73.2,73.3,73.5,73.7,74.5,74.7,75.1,75.3,75.9,76.2,76.2,76.4,77.0,77.3,77.5,77.8,78.0,78.2 +Comoros,40.58,40.91,41.25,41.61,41.99,42.38,42.78,43.19,43.61,44.04,44.47,44.89,45.32,45.75,46.18,46.63,47.1,47.58,48.09,48.61,49.12,49.63,50.12,50.59,51.03,51.46,51.89,52.3,52.72,53.15,53.59,54.03,54.48,54.93,55.36,55.77,56.15,56.5,56.81,57.1,57.4,57.8,58.2,58.5,58.9,58.4,59.4,60.0,61.4,62.1,63.0,63.8,64.8,65.5,66.0,66.3,66.6,67.1,66.7,67.7,67.2,67.6,67.8,68.0,68.1,68.2 +"Congo, Dem. Rep.",40.07,40.58,41.06,41.53,41.97,42.39,42.79,43.17,43.54,43.9,44.25,44.61,44.98,45.36,45.77,46.2,46.66,47.14,47.63,48.13,48.6,49.05,49.46,49.83,50.17,50.49,50.8,51.11,51.43,51.76,52.09,52.41,52.72,53.0,53.28,53.55,53.81,54.07,54.31,54.5,54.4,54.3,54.3,54.3,54.0,51.8,53.2,53.5,54.0,54.3,54.5,54.7,54.9,55.9,56.4,56.8,57.1,57.5,57.9,58.4,58.8,59.1,59.6,60.1,60.8,61.51 +"Congo, Rep.",41.81,42.56,43.32,44.05,44.78,45.5,46.21,46.92,47.6,48.25,48.88,49.47,50.04,50.55,51.02,51.45,51.84,52.21,52.54,52.85,53.14,53.42,53.69,53.94,54.2,54.45,54.71,54.97,55.22,55.45,55.65,55.81,55.93,55.98,55.94,55.79,55.54,55.21,54.78,54.3,54.4,54.4,53.5,53.2,52.6,52.2,46.3,49.9,51.6,52.5,53.5,54.3,55.0,55.8,56.7,57.8,58.3,58.8,59.8,60.4,60.9,61.3,61.5,61.5,61.5,61.5 +Costa Rica,56.6,57.19,57.79,58.38,58.98,59.57,60.17,60.77,61.37,61.97,62.56,63.13,63.7,64.26,64.8,65.33,65.85,66.35,66.84,67.34,67.86,68.4,68.95,69.53,70.12,70.75,71.38,72.0,72.62,73.2,73.73,74.22,74.66,75.04,75.37,75.66,75.9,76.14,76.37,76.6,76.5,76.6,76.6,76.7,76.8,76.8,77.0,77.2,77.5,77.7,78.0,78.2,78.4,78.7,79.0,79.3,79.6,79.8,79.8,79.8,79.9,80.0,80.1,80.2,80.3,80.4 +Cote d'Ivoire,32.0,32.54,33.1,33.71,34.36,35.03,35.75,36.49,37.24,38.0,38.74,39.46,40.17,40.84,41.51,42.21,42.93,43.7,44.53,45.38,46.27,47.15,48.02,48.85,49.63,50.37,51.06,51.7,52.31,52.87,53.38,53.87,54.31,54.69,55.02,55.26,55.43,55.5,55.46,55.3,54.9,54.4,53.7,53.2,52.5,52.3,52.3,52.2,52.2,52.0,52.1,52.3,52.6,52.8,53.4,54.1,54.9,55.4,56.0,56.6,57.0,57.5,58.1,58.5,59.1,59.71 +Croatia,60.57,61.08,61.6,62.1,62.58,63.06,63.52,63.98,64.41,64.85,65.26,65.66,66.05,66.43,66.8,67.16,67.52,67.87,68.22,68.54,68.86,69.14,69.4,69.63,69.83,70.0,70.16,70.3,70.42,70.56,70.71,70.89,71.08,71.31,71.54,71.78,72.0,72.22,72.4,72.6,71.9,72.3,72.9,73.4,73.0,73.4,73.4,73.5,73.8,74.2,74.6,74.9,75.1,75.3,75.7,75.9,76.0,76.2,76.4,76.7,77.1,77.4,77.6,77.8,77.8,77.8 +Cuba,58.53,59.12,59.71,60.29,60.89,61.48,62.07,62.66,63.25,63.85,64.47,65.09,65.71,66.35,66.99,67.6,68.2,68.78,69.32,69.84,70.34,70.82,71.29,71.74,72.18,72.59,72.96,73.3,73.59,73.84,74.05,74.22,74.36,74.48,74.57,74.62,74.65,74.67,74.67,74.7,74.8,74.7,74.7,74.8,75.0,75.2,75.4,75.6,75.8,76.2,76.4,76.8,76.9,77.0,77.1,77.3,77.5,77.6,77.7,77.8,77.9,78.0,78.0,78.1,78.2,78.3 +Cyprus,66.13,66.58,67.03,67.45,67.87,68.26,68.65,69.01,69.38,69.72,70.06,70.38,70.71,71.02,71.33,71.62,71.92,72.19,72.47,72.73,72.99,73.23,73.47,73.7,73.93,74.15,74.37,74.58,74.79,74.99,75.19,75.38,75.58,75.76,75.95,76.12,76.3,76.47,76.64,76.8,76.4,76.7,76.8,76.4,76.7,77.1,77.1,77.1,77.5,77.7,78.5,78.7,79.0,79.1,79.0,79.5,79.8,80.0,80.3,80.6,81.1,81.5,81.7,81.7,81.8,81.9 +Czech Republic,65.32,66.94,67.64,68.14,69.06,69.47,69.14,70.05,70.04,70.58,70.77,70.04,70.56,70.73,70.43,70.65,70.55,70.11,69.62,69.72,69.96,70.49,70.33,70.42,70.77,70.88,70.94,71.02,71.13,70.67,71.11,71.22,71.0,71.26,71.48,71.42,71.87,72.08,72.13,71.8,72.0,72.3,72.7,73.0,73.4,73.8,74.2,74.5,74.7,75.0,75.3,75.4,75.6,75.9,76.2,76.5,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.6,78.8,79.0 +Denmark,70.97,70.82,71.2,71.4,71.97,72.11,71.87,72.3,72.29,72.28,72.55,72.43,72.52,72.61,72.49,72.57,73.06,73.27,73.36,73.49,73.55,73.59,73.83,73.96,74.24,73.91,74.82,74.59,74.41,74.3,74.44,74.78,74.65,74.81,74.68,74.86,74.97,75.06,75.1,75.1,75.4,75.4,75.4,75.4,75.6,75.9,76.2,76.7,76.3,77.1,77.2,77.2,77.6,77.8,78.3,78.3,78.4,78.9,79.1,79.4,79.9,80.3,80.3,80.3,80.4,80.5 +Djibouti,41.48,41.89,42.31,42.77,43.23,43.71,44.21,44.73,45.24,45.77,46.28,46.79,47.3,47.8,48.33,48.9,49.53,50.23,50.99,51.75,52.51,53.2,53.83,54.38,54.85,55.29,55.71,56.15,56.61,57.1,57.59,58.08,58.55,58.97,59.38,59.74,60.09,60.42,60.72,61.0,60.7,60.4,60.7,60.0,60.4,60.3,60.1,60.0,59.9,60.0,60.1,60.2,60.3,60.4,60.7,60.7,61.5,61.8,62.1,62.3,62.5,62.8,63.1,63.1,63.8,64.51 +Dominican Republic,45.6,46.5,47.39,48.27,49.15,50.01,50.87,51.71,52.54,53.37,54.17,54.97,55.75,56.52,57.28,58.02,58.75,59.47,60.16,60.83,61.47,62.09,62.67,63.23,63.75,64.25,64.73,65.19,65.65,66.12,66.6,67.11,67.63,68.18,68.75,69.34,69.96,70.58,71.2,71.8,72.2,72.5,72.5,72.5,72.6,72.6,72.9,72.9,73.2,73.3,73.4,73.5,73.5,73.1,73.3,73.5,73.7,74.1,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5 +Ecuador,48.06,48.64,49.23,49.87,50.54,51.23,51.93,52.65,53.38,54.09,54.77,55.42,56.01,56.53,57.02,57.47,57.89,58.32,58.76,59.21,59.67,60.16,60.67,61.18,61.73,62.3,62.9,63.51,64.16,64.82,65.49,66.17,66.85,67.53,68.18,68.83,69.46,70.06,70.64,71.2,71.4,71.7,71.8,72.2,72.3,72.5,72.7,72.8,73.1,73.2,73.4,73.6,73.7,73.9,74.1,74.3,74.5,74.7,74.9,75.1,75.3,75.5,75.6,75.8,75.9,76.0 +Egypt,39.32,40.72,42.03,43.22,44.3,45.29,46.17,46.97,47.68,48.31,48.89,49.43,49.94,50.42,50.88,51.29,51.65,51.97,52.25,52.54,52.88,53.31,53.84,54.46,55.17,55.93,56.69,57.45,58.16,58.85,59.52,60.21,60.93,61.65,62.38,63.07,63.7,64.27,64.76,65.2,65.4,66.1,66.4,66.7,67.4,67.9,68.2,68.6,69.0,69.7,69.7,69.8,69.8,69.9,70.1,70.1,70.3,70.2,70.1,70.1,70.4,70.5,71.0,71.3,71.5,71.7 +El Salvador,44.11,45.06,45.99,46.9,47.8,48.68,49.55,50.39,51.22,52.02,52.77,53.5,54.18,54.81,55.4,55.93,56.41,56.84,57.24,57.57,57.85,58.07,58.22,58.33,58.36,58.33,58.22,58.09,57.98,57.96,58.13,58.53,59.19,60.11,61.24,62.54,63.91,65.28,66.55,67.7,68.1,68.9,69.3,69.6,70.0,70.3,70.8,71.0,71.6,71.9,71.7,72.5,72.6,72.8,73.0,73.3,73.5,73.7,73.8,74.1,74.3,74.5,74.6,74.8,74.9,75.0 +Equatorial Guinea,34.55,34.9,35.25,35.59,35.95,36.3,36.65,36.99,37.34,37.69,38.04,38.38,38.73,39.08,39.44,39.78,40.13,40.48,40.82,41.17,41.52,41.87,42.21,42.56,42.91,43.28,43.65,44.04,44.44,44.85,45.26,45.69,46.11,46.52,46.92,47.33,47.73,48.14,48.52,48.9,48.7,48.7,48.6,48.5,48.5,48.9,50.3,51.2,52.0,52.9,54.0,54.9,55.3,55.9,56.0,56.8,57.1,57.5,58.0,58.6,58.7,59.4,60.5,61.0,61.0,61.0 +Eritrea,36.47,36.75,37.02,37.29,37.58,37.86,38.14,38.42,38.73,39.03,39.35,39.69,40.04,40.41,40.81,41.22,41.66,42.1,42.56,43.02,43.47,43.92,44.35,44.75,45.14,45.49,45.8,46.09,46.38,46.66,46.97,47.33,47.74,48.21,48.77,49.38,50.06,50.8,51.58,52.4,53.4,54.9,56.2,57.0,57.8,58.4,59.0,58.8,52.2,37.6,59.9,60.0,59.9,60.0,59.9,60.0,60.1,60.1,60.1,60.1,60.2,60.3,60.4,60.6,60.7,60.8 +Estonia,59.91,61.13,63.7,65.05,65.73,67.36,67.84,68.29,68.72,69.42,69.74,69.93,69.99,70.74,70.81,70.78,71.08,70.7,70.4,70.51,70.71,70.48,70.83,70.94,70.26,69.88,70.01,69.87,69.66,69.75,69.62,70.03,69.95,69.83,69.97,71.11,71.13,71.17,70.73,70.1,69.6,69.3,68.2,66.3,67.7,69.8,70.0,69.5,70.2,70.4,70.0,70.9,71.5,72.0,72.5,72.9,73.0,74.2,74.9,76.4,76.3,76.7,77.5,77.6,77.8,78.0 +Ethiopia,33.09,33.41,33.8,34.23,34.72,35.25,32.41,30.37,37.08,37.72,38.35,38.94,39.49,39.36,38.13,39.09,41.09,41.38,41.65,41.9,42.14,41.98,39.85,37.71,38.78,42.86,42.41,42.07,42.74,42.8,42.87,42.93,42.5,39.46,35.43,41.39,43.95,44.4,44.82,45.2,46.9,47.8,48.4,48.8,49.2,50.0,50.6,51.1,50.6,52.1,52.7,53.6,54.3,55.2,56.1,57.2,58.6,60.0,61.2,62.1,62.9,63.6,64.2,64.7,65.2,65.7 +Fiji,51.3,51.85,52.38,52.9,53.4,53.89,54.36,54.81,55.26,55.7,56.12,56.54,56.94,57.35,57.75,58.14,58.52,58.89,59.26,59.61,59.96,60.29,60.6,60.91,61.21,61.5,61.8,62.09,62.37,62.65,62.92,63.2,63.46,63.71,63.96,64.2,64.43,64.66,64.88,65.1,65.1,65.0,64.8,64.7,64.5,64.3,64.1,64.2,64.1,64.2,64.4,64.5,64.6,64.7,64.8,64.8,64.9,64.9,64.9,65.2,65.3,65.4,65.6,65.7,65.8,65.9 +Finland,65.68,66.56,66.63,67.59,67.39,68.01,67.51,68.65,68.83,69.03,69.07,68.78,69.19,69.4,69.16,69.68,69.86,69.82,69.7,70.4,70.22,70.91,71.42,71.34,71.89,72.04,72.56,73.13,73.42,73.71,74.03,74.6,74.51,74.82,74.49,74.86,74.89,74.85,75.07,75.1,75.4,75.7,76.0,76.4,76.7,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.5,78.8,79.0,79.2,79.4,79.6,79.8,80.0,80.3,80.5,80.8,80.9,80.9,80.9 +France,66.17,67.46,67.4,68.27,68.54,68.57,69.0,70.24,70.27,70.49,71.07,70.61,70.46,71.43,71.26,71.67,71.67,71.66,71.4,72.29,72.27,72.52,72.69,73.04,73.13,73.38,73.99,74.12,74.43,74.53,74.69,75.07,75.06,75.56,75.67,75.95,76.55,76.78,76.91,77.2,77.3,77.6,77.7,78.0,78.2,78.4,78.7,78.7,78.8,79.1,79.2,79.4,79.6,80.2,80.4,80.7,81.0,81.1,81.2,81.4,81.6,81.6,81.7,81.7,81.8,81.9 +French Guiana,52.52,53.05,53.58,54.12,54.67,55.22,55.78,56.37,57.0,57.68,58.44,59.28,60.19,61.14,62.1,63.0,63.8,64.46,64.97,65.34,65.57,65.71,65.81,65.91,66.04,66.24,66.51,66.87,67.3,67.79,68.31,68.83,69.33,69.79,70.2,70.57,70.92,71.27,71.6,71.94,72.27,72.61,72.93,73.25,73.56,73.84,74.1,74.34,74.55,74.75,74.92,75.07,75.21,75.35,75.5,75.65,75.82,76.01,76.21,76.43,76.65,76.89,77.12,77.35,77.58,77.81 +French Polynesia,46.52,48.28,49.86,51.27,52.5,53.55,54.44,55.18,55.78,56.28,56.71,57.09,57.47,57.85,58.24,58.65,59.06,59.45,59.83,60.18,60.52,60.84,61.15,61.47,61.82,62.23,62.72,63.28,63.9,64.56,65.22,65.84,66.39,66.87,67.27,67.59,67.88,68.15,68.42,68.7,69.01,69.33,69.68,70.05,70.43,70.82,71.21,71.59,71.96,72.31,72.67,73.03,73.4,73.77,74.13,74.48,74.81,75.11,75.38,75.62,75.84,76.05,76.26,76.47,76.69,76.91 +Gabon,35.84,36.34,36.8,37.19,37.54,37.83,38.1,38.33,38.56,38.83,39.15,39.56,40.07,40.7,41.42,42.21,43.06,43.9,44.74,45.55,46.35,47.13,47.9,48.68,49.45,50.23,51.01,51.81,52.61,53.42,54.24,55.07,55.88,56.66,57.4,58.04,58.58,59.0,59.32,59.5,59.8,60.2,60.1,59.9,59.8,59.6,59.9,60.0,59.7,59.3,59.0,59.4,59.4,59.4,60.1,60.9,61.6,61.7,62.1,63.0,63.3,63.9,64.4,65.0,65.9,66.81 +Gambia,31.85,32.33,32.78,33.22,33.65,34.06,34.46,34.86,35.27,35.7,36.16,36.68,37.26,37.91,38.66,39.47,40.36,41.3,42.3,43.31,44.36,45.42,46.47,47.51,48.56,49.58,50.6,51.61,52.62,53.61,54.59,55.55,56.48,57.37,58.21,58.97,59.65,60.26,60.81,61.3,61.5,61.5,62.0,62.3,62.6,62.8,63.1,63.4,63.4,63.6,63.9,63.8,64.4,64.7,64.9,65.2,65.3,65.7,66.0,66.5,67.1,67.5,67.8,68.0,68.1,68.2 +Georgia,59.96,60.36,60.75,61.15,61.54,61.93,62.32,62.72,63.11,63.5,63.9,64.31,64.71,65.11,65.52,65.9,66.26,66.6,66.93,67.24,67.54,67.85,68.17,68.47,68.76,69.0,69.19,69.31,69.37,69.4,69.42,69.46,69.52,69.62,69.75,69.86,69.94,69.96,69.95,69.9,69.9,69.4,69.2,70.2,70.7,71.2,71.3,71.4,71.4,71.4,71.7,71.6,71.7,71.5,71.8,71.9,72.1,71.8,72.1,72.2,72.2,72.4,72.5,72.6,72.9,73.2 +Germany,67.08,67.4,67.7,68.0,68.28,68.57,68.49,69.23,69.34,69.26,69.85,70.01,70.1,70.66,70.65,70.77,70.99,70.64,70.48,70.72,70.94,71.16,71.41,71.71,71.56,72.02,72.63,72.6,72.96,73.14,73.37,73.69,73.97,74.44,74.55,74.75,75.15,75.33,75.51,75.4,75.6,76.0,76.1,76.4,76.6,76.9,77.3,77.6,77.8,78.1,78.4,78.6,78.8,79.2,79.4,79.7,79.9,80.0,80.1,80.3,80.5,80.6,80.7,80.7,80.8,80.9 +Ghana,41.66,42.22,42.76,43.3,43.83,44.36,44.87,45.37,45.86,46.34,46.8,47.25,47.66,48.07,48.44,48.8,49.14,49.46,49.78,50.08,50.39,50.7,51.02,51.35,51.68,52.0,52.33,52.63,52.95,53.26,53.6,53.95,54.34,54.76,55.23,55.75,56.31,56.89,57.47,58.0,58.4,58.7,59.5,59.6,60.0,60.1,59.8,60.1,60.1,60.0,59.9,60.0,60.2,60.5,60.8,61.2,61.6,62.0,62.4,62.9,63.5,64.1,64.5,64.8,65.3,65.8 +Greece,65.57,65.72,65.92,66.16,66.46,66.79,67.16,67.57,67.99,68.41,68.8,69.14,69.44,69.69,69.91,70.12,70.34,70.59,70.88,71.2,71.53,71.85,72.13,72.39,72.62,72.85,73.1,73.38,73.68,74.01,74.33,74.64,74.94,75.21,75.47,75.73,76.01,76.32,76.66,77.0,77.1,77.1,77.5,77.7,77.8,77.9,78.1,78.2,78.3,78.6,78.9,79.1,79.3,79.4,79.6,80.0,79.8,80.2,80.2,80.4,80.5,80.6,81.0,81.0,81.0,81.0 +Greenland,43.94,45.59,48.67,51.76,54.85,57.94,58.82,59.71,60.6,61.49,61.85,62.22,62.59,62.97,63.34,63.71,64.08,64.45,64.82,65.19,65.01,64.84,64.66,64.49,64.31,64.14,63.96,63.78,63.61,63.09,62.71,62.8,62.89,63.05,63.42,63.81,64.22,64.14,64.22,64.6,65.1,65.5,65.9,66.3,66.5,66.8,66.9,67.2,67.5,67.8,68.0,68.3,68.5,68.8,69.1,69.5,70.0,70.3,70.6,70.8,71.2,71.6,71.8,72.0,72.1,72.2 +Grenada,55.81,56.39,56.97,57.52,58.07,58.61,59.12,59.63,60.11,60.59,61.05,61.49,61.93,62.35,62.76,63.16,63.54,63.91,64.27,64.62,64.97,65.29,65.62,65.92,66.22,66.52,66.79,67.07,67.33,67.6,67.86,68.1,68.35,68.59,68.83,69.06,69.28,69.5,69.7,69.9,70.2,70.2,70.0,70.4,70.7,70.8,70.8,70.6,70.6,70.5,70.3,70.2,70.2,69.3,70.3,70.5,70.7,70.8,70.9,71.0,71.0,71.1,71.2,71.4,71.5,71.6 +Guadeloupe,52.09,52.94,53.77,54.57,55.35,56.11,56.84,57.55,58.24,58.91,59.58,60.23,60.87,61.51,62.14,62.75,63.34,63.91,64.46,64.98,65.49,65.99,66.49,66.97,67.46,67.93,68.4,68.86,69.31,69.75,70.18,70.6,71.01,71.42,71.82,72.21,72.6,72.98,73.35,73.72,74.08,74.44,74.79,75.14,75.48,75.82,76.15,76.48,76.8,77.12,77.43,77.74,78.04,78.35,78.65,78.95,79.25,79.55,79.85,80.14,80.43,80.69,80.95,81.18,81.41,81.64 +Guam,56.53,57.04,57.55,58.08,58.6,59.12,59.65,60.18,60.71,61.24,61.76,62.28,62.79,63.29,63.78,64.26,64.72,65.18,65.63,66.06,66.49,66.9,67.3,67.7,68.07,68.43,68.79,69.13,69.47,69.8,70.12,70.42,70.73,71.02,71.3,71.58,71.84,72.09,72.35,72.6,72.4,72.4,72.5,72.7,73.0,73.2,69.4,73.4,73.5,73.6,73.6,73.6,73.5,73.3,73.1,72.7,72.4,72.1,71.8,71.6,71.5,71.5,71.6,71.6,71.7,71.8 +Guatemala,42.06,42.44,42.83,43.27,43.73,44.23,44.77,45.32,45.91,46.51,47.12,47.76,48.4,49.05,49.73,50.43,51.16,51.93,52.72,53.5,54.27,55.0,55.67,56.28,56.82,57.32,57.78,58.22,58.66,59.12,59.6,60.1,60.62,61.16,61.72,62.28,62.86,63.45,64.02,64.6,64.0,63.8,64.2,64.6,66.9,68.1,67.7,67.7,68.8,68.8,69.3,70.0,70.1,70.2,69.8,70.2,71.0,71.2,70.9,71.2,71.6,72.1,72.3,72.4,72.6,72.8 +Guinea,33.12,33.44,33.74,34.04,34.35,34.64,34.92,35.2,35.46,35.71,35.95,36.17,36.37,36.57,36.77,36.96,37.16,37.37,37.61,37.89,38.2,38.57,38.97,39.43,39.94,40.47,41.04,41.63,42.25,42.92,43.66,44.47,45.37,46.32,47.33,48.35,49.37,50.33,51.22,52.0,52.3,52.5,53.0,53.1,53.4,53.8,54.0,54.0,54.0,54.2,54.4,54.7,55.1,55.6,56.0,56.4,56.8,57.1,57.5,57.9,58.2,58.5,58.8,58.6,59.1,59.6 +Guinea-Bissau,39.65,40.03,40.42,40.81,41.2,41.58,41.97,42.36,42.75,43.14,43.39,43.64,43.89,44.15,44.39,44.63,44.86,45.09,45.29,45.5,45.71,45.91,46.12,46.33,46.54,46.77,47.02,47.27,47.54,47.83,48.13,48.45,48.78,49.13,49.49,49.87,50.26,50.67,51.09,51.5,51.7,51.8,52.0,52.2,52.3,52.6,52.8,51.7,52.5,52.8,52.7,52.7,52.8,52.8,52.9,53.0,53.2,53.6,53.9,54.3,54.5,54.8,55.1,55.3,55.6,55.9 +Guyana,57.51,57.68,57.85,58.04,58.21,58.38,58.56,58.73,58.9,59.08,59.24,59.41,59.58,59.75,59.92,60.09,60.25,60.43,60.59,60.75,60.92,61.08,61.24,61.4,61.56,61.72,61.88,62.03,62.18,62.34,62.5,62.67,62.85,63.02,63.21,63.41,63.61,63.8,64.01,64.2,64.3,64.5,64.4,64.5,64.4,64.3,64.3,64.3,64.3,64.2,63.9,63.5,63.7,64.2,64.4,64.8,64.9,65.0,65.3,65.5,65.6,65.9,66.2,66.4,66.8,67.2 +Haiti,36.56,37.22,37.87,38.5,39.12,39.74,40.34,40.93,41.52,42.1,42.68,43.26,43.82,44.38,44.93,45.43,45.9,46.33,46.73,47.1,47.45,47.81,48.17,48.53,48.9,49.28,49.63,49.97,50.3,50.62,50.94,51.29,51.64,52.02,52.42,52.81,53.21,53.59,53.95,54.3,54.4,54.9,54.7,55.4,56.2,56.7,57.0,57.5,58.0,58.7,59.2,59.6,59.7,58.6,60.0,60.3,60.8,61.0,61.7,32.2,62.4,62.9,63.4,63.8,64.3,64.8 +Honduras,41.86,42.39,42.95,43.54,44.16,44.83,45.52,46.23,46.97,47.71,48.47,49.21,49.94,50.65,51.35,52.02,52.68,53.34,54.0,54.68,55.37,56.07,56.8,57.56,58.34,59.15,59.97,60.8,61.65,62.5,63.36,64.24,65.12,66.01,66.86,67.67,68.42,69.11,69.73,70.3,70.3,70.1,69.9,70.1,70.1,70.1,70.2,63.9,70.3,70.5,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,71.9,72.0,72.2,72.3,72.6,72.8,73.0,73.2 +"Hong Kong, China",62.38,62.9,63.43,63.98,64.54,65.11,65.69,66.28,66.87,67.45,68.01,68.55,69.05,69.52,69.96,70.37,70.77,71.15,71.53,71.89,72.25,72.58,72.9,73.2,73.49,73.78,74.06,74.35,74.64,74.93,75.22,75.5,75.77,76.03,76.28,76.53,76.78,77.02,77.27,77.52,77.77,78.01,78.25,78.48,78.72,78.99,79.29,79.63,79.99,80.36,80.73,81.08,81.4,81.68,81.92,82.12,82.31,82.49,82.66,82.84,83.02,83.2,83.38,83.56,83.73,83.9 +Hungary,62.48,64.05,63.89,65.46,66.91,66.07,66.44,67.45,67.35,68.13,69.06,68.0,69.02,69.52,69.22,69.98,69.55,69.38,69.45,69.29,69.19,69.82,69.69,69.41,69.46,69.75,70.02,69.56,69.77,69.18,69.24,69.47,69.03,69.07,69.01,69.22,69.69,70.09,69.53,69.5,69.2,69.1,69.2,69.5,70.1,70.5,70.9,71.1,71.3,71.8,72.3,72.6,72.7,72.9,73.1,73.3,73.6,73.9,74.3,74.6,75.0,75.5,76.1,76.5,76.7,76.9 +Iceland,71.12,72.57,72.39,73.45,73.4,73.08,73.58,73.55,72.78,74.22,73.6,73.82,73.13,73.72,74.0,73.4,73.9,74.12,73.9,74.0,73.75,74.66,74.52,74.59,75.57,76.94,76.35,76.66,76.88,76.92,76.61,77.26,76.91,77.71,77.85,78.38,77.53,77.39,78.46,78.3,78.4,78.6,78.8,79.1,78.9,79.4,79.6,79.9,80.2,80.5,80.8,81.0,81.3,81.5,81.7,81.8,82.1,82.4,82.5,82.8,82.9,83.1,83.2,83.3,83.3,83.3 +India,35.1,35.76,36.44,37.11,37.79,38.48,39.16,39.85,40.56,41.26,41.99,42.72,43.46,44.23,44.98,45.73,46.49,47.21,47.93,48.65,49.35,50.08,50.81,51.53,52.25,52.93,53.56,54.14,54.65,55.1,55.51,55.86,56.19,56.51,56.81,57.11,57.39,57.65,57.93,58.2,58.5,58.8,59.1,59.5,59.9,60.2,60.5,60.8,61.2,61.5,61.9,62.3,62.8,63.2,63.6,63.9,64.3,64.7,65.0,65.4,65.7,66.1,66.5,66.9,67.2,67.5 +Indonesia,36.99,37.93,38.86,39.78,40.68,41.57,42.45,43.32,44.17,45.01,45.83,46.65,47.45,48.24,43.77,44.18,50.54,51.27,52.0,52.71,53.4,54.09,54.75,55.41,56.04,56.67,57.27,57.87,58.45,59.01,59.57,60.12,60.64,61.16,61.66,62.15,62.63,63.1,63.55,64.0,64.5,64.9,65.3,65.7,66.1,66.4,66.7,67.0,67.2,67.5,67.8,68.0,68.2,66.7,68.7,68.9,69.2,69.4,69.6,69.8,70.1,70.3,70.6,70.8,71.1,71.4 +Iran,40.29,40.92,41.56,42.19,42.84,43.47,44.11,44.74,45.38,46.0,46.61,47.22,47.83,48.43,49.04,49.66,50.3,50.98,51.67,52.43,53.28,54.24,55.24,56.24,57.1,57.64,57.78,57.52,56.95,56.24,55.62,55.32,55.49,56.19,57.39,59.01,60.83,62.67,64.43,66.0,67.8,68.5,69.1,69.6,69.9,69.8,70.3,70.8,71.3,71.4,71.3,71.3,70.1,71.5,71.9,72.4,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.6,74.6 +Iraq,35.08,36.58,38.04,39.45,40.81,42.11,43.38,44.61,45.79,46.96,48.11,49.24,50.36,51.46,52.52,53.51,54.42,55.24,55.96,56.61,57.21,57.77,58.31,58.81,59.19,59.35,59.26,58.94,58.44,57.91,57.52,57.41,57.68,58.34,59.36,60.65,62.05,63.41,64.65,65.7,63.9,65.4,65.4,65.4,65.3,65.3,65.2,65.7,65.9,65.8,66.4,66.1,66.1,66.3,65.7,65.1,65.3,66.6,67.1,67.3,67.7,68.1,68.3,67.7,67.4,67.1 +Ireland,65.07,67.52,68.3,68.44,68.46,69.43,69.51,69.84,69.99,70.76,70.24,70.57,70.85,71.12,71.35,70.89,71.95,71.67,71.62,71.68,72.5,71.86,72.11,72.08,72.68,72.81,72.98,72.98,73.28,73.66,74.04,74.34,74.4,74.87,74.84,74.93,75.76,75.82,75.87,76.3,76.7,76.8,76.9,77.3,77.1,77.5,77.4,77.6,77.7,77.8,78.4,78.8,79.1,79.3,79.7,79.8,80.1,80.1,80.3,81.0,80.6,81.1,81.5,81.6,81.7,81.8 +Israel,64.42,65.04,65.62,66.15,66.65,67.1,67.51,67.89,68.24,68.55,68.85,69.13,69.41,69.68,69.93,70.17,70.39,70.6,70.78,70.96,71.13,71.33,71.54,71.78,72.04,72.33,72.62,72.9,73.19,73.47,73.74,73.99,74.49,74.78,75.1,74.92,75.29,75.65,76.24,76.7,76.5,76.3,76.9,77.1,77.4,77.7,77.9,78.1,78.5,78.6,78.8,78.6,79.1,79.5,79.7,79.6,80.3,80.6,81.0,81.6,81.6,82.1,82.0,81.3,82.1,82.91 +Italy,65.3,65.93,66.56,67.88,68.23,67.62,67.79,68.85,69.3,69.19,69.82,69.21,69.32,70.37,70.24,70.99,71.03,70.85,70.87,71.62,71.87,72.15,72.09,72.81,72.72,73.07,73.44,73.78,74.11,74.07,74.46,74.93,74.75,75.51,75.62,75.94,76.36,76.54,76.94,77.0,77.0,77.3,77.6,77.8,78.1,78.3,78.7,78.9,79.3,79.6,79.8,80.1,80.1,80.9,81.1,81.2,81.3,81.5,81.6,81.9,82.0,82.0,82.1,82.1,82.2,82.3 +Jamaica,58.02,59.06,60.07,61.03,61.95,62.83,63.66,64.47,65.21,65.91,66.57,67.17,67.74,68.25,68.73,69.17,69.58,69.99,70.36,70.72,71.06,71.39,71.71,72.0,72.29,72.58,72.89,73.21,73.52,73.82,74.1,74.34,74.55,74.7,74.79,74.84,74.85,74.85,74.83,74.8,74.9,74.9,74.8,74.8,74.7,74.5,74.4,74.5,74.6,74.4,74.2,74.5,74.8,75.0,75.4,75.5,75.3,75.1,74.8,74.8,74.6,74.7,74.8,74.8,75.0,75.2 +Japan,60.98,63.02,63.36,64.6,65.76,65.62,65.49,67.11,67.49,67.78,68.43,68.71,69.79,70.26,70.31,71.12,71.41,71.73,71.96,72.05,72.87,73.39,73.45,73.88,74.38,74.78,75.35,75.67,76.18,76.16,76.57,77.08,77.11,77.5,77.8,78.22,78.63,78.54,78.97,79.0,79.1,79.3,79.4,79.8,79.7,80.2,80.4,80.5,80.6,81.0,81.3,81.6,81.7,81.9,82.0,82.2,82.4,82.5,82.7,82.7,82.6,82.9,83.0,83.1,83.2,83.3 +Jordan,45.56,46.45,47.34,48.23,49.09,49.95,50.8,51.65,52.48,53.3,54.12,54.94,55.75,56.55,57.35,58.13,58.9,59.66,60.42,61.15,61.87,62.59,63.3,63.98,64.64,65.28,65.89,66.46,67.0,67.51,68.0,68.45,68.9,69.33,69.75,70.15,70.52,70.87,71.2,71.5,71.9,72.2,72.2,72.4,72.5,72.6,72.8,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.5,75.5,76.3,76.9,77.5,77.9,78.1,78.2,78.3,78.4,78.5,78.6 +Kazakhstan,54.67,55.15,55.63,56.11,56.58,57.05,57.51,57.98,58.44,58.91,59.38,59.85,60.31,60.79,61.24,61.69,62.12,62.53,62.92,63.27,63.6,63.89,64.16,64.4,64.64,64.88,65.13,65.41,65.71,66.05,66.43,66.84,67.28,67.7,68.07,68.34,68.48,68.47,68.31,68.0,67.6,67.1,65.3,64.6,63.6,63.5,63.9,64.2,65.0,64.9,65.1,65.4,65.3,65.3,65.3,65.3,65.8,67.1,68.2,68.5,69.1,69.7,70.0,70.2,70.2,70.2 +Kenya,42.33,42.71,43.16,43.64,44.17,44.75,45.37,46.03,46.72,47.42,48.13,48.82,49.48,50.13,50.75,51.35,51.96,52.57,53.19,53.83,54.45,55.08,55.69,56.29,56.89,57.49,58.1,58.74,59.36,59.96,60.52,61.02,61.42,61.74,61.96,62.09,62.13,62.1,62.0,61.8,61.1,60.3,59.5,58.7,58.1,57.4,56.7,56.1,55.8,55.6,55.6,55.7,55.8,56.2,57.2,58.4,59.8,60.8,61.9,62.9,63.7,64.3,64.8,65.0,65.1,65.2 +Kiribati,42.25,42.65,43.05,43.44,43.85,44.25,44.64,45.04,45.45,45.84,46.24,46.64,47.03,47.44,47.84,48.23,48.63,49.02,49.42,49.82,50.21,50.61,51.02,51.41,51.81,52.2,52.58,52.97,53.36,53.75,54.17,54.62,55.08,55.56,56.04,56.5,56.93,57.33,57.68,58.0,58.2,58.4,58.4,58.7,58.9,59.2,59.4,59.5,59.6,59.8,60.1,60.2,60.4,60.6,60.8,61.0,61.2,61.5,61.7,61.9,62.1,62.3,62.6,62.8,63.0,63.2 +Kuwait,52.95,54.13,55.27,56.36,57.43,58.45,59.44,60.38,61.29,62.15,62.98,63.77,64.53,65.24,65.92,66.58,67.2,67.8,68.38,68.93,69.46,69.97,70.46,70.93,71.39,71.85,72.29,72.72,73.14,73.58,74.0,74.41,74.81,75.22,75.59,75.95,76.29,76.62,76.92,77.2,64.4,80.0,78.7,77.6,76.5,76.0,76.2,76.3,77.3,77.7,77.6,78.2,78.5,78.1,77.7,77.7,77.7,77.3,77.4,78.5,79.0,79.1,79.7,80.2,80.3,80.4 +Kyrgyz Republic,52.07,52.52,52.96,53.41,53.86,54.31,54.75,55.2,55.64,56.09,56.54,56.99,57.44,57.9,58.34,58.76,59.19,59.58,59.95,60.3,60.61,60.88,61.14,61.38,61.6,61.83,62.05,62.3,62.57,62.89,63.23,63.62,64.04,64.45,64.86,65.23,65.54,65.77,65.93,66.0,65.9,65.6,65.3,65.0,65.1,65.2,65.3,65.6,65.8,65.9,66.0,65.9,66.0,66.2,66.5,66.7,67.0,67.3,67.7,67.9,68.5,69.0,69.4,69.6,69.8,70.0 +Lao,39.88,40.13,40.37,40.62,40.86,41.11,41.37,41.62,41.87,42.13,42.38,42.64,42.89,43.13,43.39,43.64,43.89,44.15,44.41,44.66,44.91,45.16,45.39,45.62,45.85,46.05,46.26,46.47,46.69,46.91,47.17,47.45,47.76,48.12,48.54,49.02,49.56,50.16,50.8,51.5,52.0,52.4,52.8,53.2,53.6,54.0,54.4,54.9,55.5,56.1,56.6,57.6,58.4,59.3,60.1,60.8,61.7,62.5,63.3,64.1,65.0,65.6,66.1,66.6,67.1,67.6 +Latvia,60.48,61.88,63.19,64.38,65.46,66.44,67.31,68.07,69.53,70.37,70.6,69.97,70.36,71.62,71.29,71.26,70.94,70.56,70.29,70.31,70.66,70.35,70.29,70.22,69.37,69.48,69.56,69.45,68.93,69.23,69.18,69.75,69.51,69.56,69.72,71.09,71.14,71.05,70.55,69.6,69.1,68.4,66.7,65.7,66.5,68.6,69.3,69.0,70.0,70.5,70.0,70.4,70.8,71.2,71.1,70.8,71.3,72.4,73.3,73.9,74.6,75.1,75.0,75.2,75.4,75.6 +Lebanon,59.61,60.04,60.45,60.85,61.23,61.6,61.95,62.28,62.6,62.9,63.19,63.47,63.74,64.0,64.25,64.5,64.76,65.01,65.27,65.52,65.75,65.98,66.18,66.37,66.54,66.69,66.83,66.96,67.08,67.21,67.36,67.52,67.68,67.87,68.07,68.29,68.53,68.78,69.03,69.3,71.9,72.2,72.5,73.0,73.4,74.0,74.4,74.9,75.6,75.9,76.3,76.6,76.9,77.1,77.3,77.4,77.5,77.8,77.9,78.1,76.6,78.5,78.6,78.7,78.9,79.1 +Lesotho,41.53,42.11,42.72,43.33,43.96,44.59,45.22,45.85,46.46,47.02,47.54,47.97,48.32,48.59,48.79,48.95,49.09,49.24,49.43,49.67,49.96,50.31,50.7,51.14,51.63,52.17,52.75,53.38,54.01,54.65,55.25,55.82,56.34,56.83,57.31,57.88,58.51,59.21,59.92,60.5,60.6,60.4,60.1,59.2,58.7,57.9,56.6,54.6,52.9,50.7,48.9,47.0,45.4,44.2,43.1,43.1,43.3,44.5,45.5,46.4,46.7,46.1,45.6,45.4,47.1,48.86 +Liberia,33.11,33.36,33.6,33.84,34.07,34.28,34.51,34.73,34.98,35.24,35.54,35.88,36.28,36.73,37.23,37.77,38.33,38.91,39.49,40.1,40.75,41.43,42.16,42.91,43.68,44.45,45.21,45.92,46.57,47.14,47.6,47.97,48.25,48.46,48.58,48.62,48.59,48.55,48.53,48.6,51.5,51.8,50.1,48.9,50.9,50.4,53.8,54.4,55.2,55.8,56.3,55.4,55.2,57.9,58.4,58.8,59.3,59.9,60.3,60.8,61.5,62.3,62.9,61.8,63.2,64.63 +Libya,38.07,37.73,37.66,37.89,38.39,39.18,40.22,41.5,42.97,44.59,46.28,48.0,49.69,51.28,52.77,54.15,55.45,56.69,57.88,59.01,60.11,61.16,62.16,63.13,64.06,64.95,65.81,66.62,67.4,68.13,68.82,69.46,70.04,70.58,71.09,71.56,72.03,72.48,72.94,73.4,73.7,73.8,74.2,74.4,74.6,74.6,74.8,74.8,74.9,74.8,75.0,75.0,75.1,75.2,75.4,75.5,75.5,75.6,75.7,75.9,60.5,75.5,75.8,75.0,74.1,73.21 +Lithuania,63.9,64.52,65.14,65.77,66.38,66.99,67.59,68.19,67.73,70.33,70.52,69.46,70.64,72.0,71.76,71.92,71.99,71.68,71.3,71.16,72.1,71.34,71.7,71.63,71.24,71.38,71.14,70.93,70.8,70.78,70.77,71.17,71.09,70.6,70.78,72.45,72.26,72.1,71.79,71.5,70.5,70.3,69.1,68.7,69.0,70.2,71.1,71.3,71.8,72.1,71.6,72.1,72.1,72.2,71.7,71.5,71.4,72.1,73.6,73.9,74.3,74.7,74.9,75.0,75.2,75.4 +Luxembourg,65.38,65.71,66.04,66.37,66.67,66.98,67.27,67.55,67.83,68.99,69.49,68.59,68.8,68.98,69.31,69.21,69.59,70.17,69.73,69.47,69.35,70.59,70.34,70.42,70.37,70.31,71.61,71.57,72.25,72.42,72.22,72.31,73.19,72.94,73.51,74.44,73.97,74.57,74.49,75.2,75.5,75.8,76.2,76.5,76.9,77.1,77.4,77.7,78.1,78.5,78.7,79.0,79.1,79.5,80.0,80.3,80.6,81.0,81.2,81.3,81.5,81.7,81.9,82.1,82.2,82.3 +"Macao, China",60.25,60.79,61.32,61.84,62.37,62.89,63.41,63.92,64.43,64.93,65.42,65.9,66.36,66.81,67.24,67.66,68.06,68.45,68.83,69.2,69.56,69.91,70.26,70.61,70.95,71.29,71.62,71.94,72.26,72.57,72.88,73.17,73.46,73.75,74.03,74.31,74.58,74.84,75.1,75.36,75.61,75.86,76.1,76.33,76.56,76.78,77.0,77.21,77.42,77.63,77.83,78.04,78.25,78.46,78.67,78.89,79.1,79.32,79.54,79.75,79.97,80.19,80.4,80.61,80.82,81.03 +"Macedonia, FYR",53.65,54.61,55.53,56.4,57.25,58.04,58.79,59.51,60.2,60.85,61.49,62.11,62.72,63.32,63.92,64.51,65.08,65.62,66.14,66.63,67.08,67.48,67.83,68.14,68.41,68.61,68.76,68.88,68.98,69.08,69.21,69.4,69.63,69.92,70.26,70.6,70.93,71.23,71.48,71.7,71.7,71.6,71.5,71.7,71.8,72.1,72.3,72.4,72.6,72.9,73.0,73.3,73.4,73.6,73.8,74.1,74.3,74.5,74.7,75.2,75.6,75.8,76.0,76.2,76.5,76.8 +Madagascar,36.69,37.28,37.86,38.45,39.03,39.62,40.21,40.79,41.38,41.96,42.54,43.12,43.7,44.28,44.85,45.43,46.01,46.6,47.18,47.77,48.36,48.94,49.5,50.06,50.59,51.12,51.63,52.12,52.58,53.01,53.36,53.64,53.86,54.03,54.19,54.38,54.63,54.98,55.43,56.0,56.2,56.4,56.3,56.8,57.2,57.6,58.0,58.3,58.8,59.1,59.6,59.8,60.1,60.6,61.2,61.7,62.0,62.2,62.3,62.4,62.6,62.8,63.0,63.3,63.5,63.7 +Malawi,36.45,36.62,36.81,37.02,37.24,37.48,37.72,37.99,38.25,38.51,38.76,39.02,39.25,39.49,39.75,40.03,40.36,40.73,41.16,41.62,42.09,42.55,43.0,43.41,43.79,44.16,44.54,44.92,45.31,45.72,46.13,46.53,46.91,47.26,47.6,47.9,48.17,48.42,48.64,48.8,48.6,48.3,48.0,47.4,46.9,46.3,45.8,45.3,45.1,45.4,45.9,46.4,47.0,47.5,48.5,49.6,51.0,52.4,53.9,55.4,56.6,58.0,59.3,60.1,60.5,60.9 +Malaysia,54.05,54.72,55.39,56.06,56.72,57.37,58.01,58.65,59.27,59.89,60.48,61.07,61.63,62.17,62.71,63.21,63.7,64.17,64.63,65.08,65.51,65.93,66.34,66.73,67.13,67.5,67.86,68.21,68.56,68.89,69.22,69.53,69.84,70.14,70.45,70.73,71.01,71.28,71.54,71.8,72.0,72.2,72.4,72.4,72.4,72.5,72.8,73.0,73.1,73.3,73.6,73.8,73.9,74.0,74.3,74.5,74.5,74.5,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5 +Maldives,33.9,34.18,34.49,34.86,35.27,35.72,36.22,36.78,37.39,38.07,38.82,39.64,40.54,41.48,42.47,43.48,44.49,45.48,46.44,47.37,48.25,49.12,49.98,50.82,51.69,52.59,53.53,54.51,55.53,56.58,57.62,58.64,59.64,60.6,61.51,62.41,63.3,64.19,65.08,66.0,66.7,67.3,67.9,68.6,69.3,70.0,70.8,71.7,72.3,73.0,73.7,74.4,75.3,74.7,76.9,77.5,78.1,78.5,78.9,79.2,79.6,79.8,79.9,80.0,80.0,80.0 +Mali,27.34,27.71,28.04,28.34,28.6,28.84,29.04,29.23,29.42,29.61,29.83,30.08,30.4,30.79,31.26,31.8,32.41,33.07,33.77,34.51,35.27,36.04,36.82,37.61,38.39,39.18,39.97,40.79,41.61,42.45,43.3,44.17,45.02,45.86,46.68,47.45,48.18,48.86,49.47,50.0,50.5,50.8,51.2,51.2,51.4,51.8,52.2,50.9,53.5,53.5,54.1,54.6,55.5,56.2,56.9,57.4,58.0,58.5,58.9,59.2,59.6,59.8,59.8,60.0,60.2,60.4 +Malta,66.02,66.17,66.35,66.55,66.79,67.06,67.34,67.65,67.97,68.32,68.67,69.02,69.37,69.7,70.03,70.36,70.67,70.98,71.29,71.6,71.9,72.2,72.49,72.78,73.07,73.36,73.63,73.92,74.19,74.47,74.74,75.01,75.28,75.54,75.81,76.08,76.33,76.59,76.84,77.1,77.3,77.5,77.9,78.2,78.4,78.5,78.8,78.9,79.0,79.2,79.4,79.8,80.1,80.3,80.7,81.0,80.9,80.7,81.2,81.3,81.3,81.6,81.7,82.0,82.1,82.2 +Martinique,54.51,55.23,55.93,56.61,57.28,57.93,58.57,59.2,59.81,60.41,61.0,61.58,62.16,62.72,63.28,63.84,64.39,64.93,65.46,65.99,66.51,67.02,67.53,68.02,68.51,69.0,69.47,69.93,70.38,70.82,71.25,71.68,72.09,72.5,72.9,73.29,73.67,74.05,74.42,74.79,75.15,75.51,75.86,76.2,76.54,76.88,77.22,77.55,77.88,78.19,78.5,78.78,79.05,79.31,79.55,79.78,80.01,80.24,80.48,80.71,80.95,81.18,81.41,81.64,81.86,82.08 +Mauritania,37.95,38.53,39.14,39.77,40.42,41.09,41.78,42.48,43.2,43.91,44.62,45.31,45.96,46.59,47.18,47.73,48.26,48.78,49.27,49.77,50.25,50.73,51.2,51.69,52.19,52.73,53.29,53.89,54.51,55.13,55.75,56.34,56.9,57.41,57.86,58.28,58.64,58.96,59.25,59.5,60.2,60.4,60.7,60.7,61.2,61.5,62.0,62.5,63.2,63.8,64.2,64.9,65.5,65.9,66.3,67.0,67.5,67.9,68.2,68.6,68.8,69.1,69.3,69.6,69.7,69.8 +Mauritius,48.57,49.61,50.68,51.78,52.92,54.09,55.28,56.46,57.63,58.74,59.75,60.64,61.38,61.97,62.4,62.67,62.85,62.97,63.05,63.14,63.27,63.45,63.68,63.99,64.37,64.83,65.34,65.87,66.41,66.92,67.34,67.7,67.96,68.14,68.26,68.38,68.53,68.74,68.99,69.3,69.6,69.7,69.8,70.0,70.3,70.5,70.7,71.0,71.2,71.4,71.6,71.7,71.9,72.1,72.4,72.5,72.7,72.9,73.2,73.4,73.7,74.1,74.2,74.3,74.5,74.7 +Mayotte,45.38,46.68,47.92,49.11,50.24,51.32,52.34,53.3,54.22,55.09,55.92,56.72,57.5,58.25,58.98,59.7,60.39,61.07,61.73,62.36,62.99,63.59,64.17,64.74,65.3,65.84,66.36,66.88,67.38,67.86,68.34,68.8,69.25,69.69,70.12,70.54,70.95,71.35,71.75,72.14,72.53,72.91,73.28,73.64,74.0,74.35,74.7,75.03,75.36,75.69,76.01,76.33,76.64,76.95,77.24,77.53,77.8,78.05,78.29,78.52,78.74,78.96,79.19,79.42,79.65,79.88 +Mexico,49.27,50.37,51.42,52.43,53.39,54.29,55.14,55.94,56.67,57.34,57.95,58.49,58.96,59.4,59.78,60.15,60.53,60.91,61.32,61.77,62.25,62.75,63.29,63.83,64.39,64.95,65.51,66.05,66.58,67.09,67.58,68.05,68.52,68.97,69.4,69.84,70.26,70.67,71.09,71.5,71.9,72.1,72.4,72.7,73.0,73.3,73.6,73.7,74.1,74.6,74.9,74.9,74.9,75.2,75.1,75.4,75.6,75.4,75.3,75.4,75.7,75.7,75.4,75.6,75.9,76.2 +"Micronesia, Fed. Sts.",53.56,53.92,54.28,54.65,55.01,55.37,55.73,56.09,56.45,56.82,57.18,57.54,57.9,58.26,58.63,58.99,59.36,59.73,60.1,60.48,60.89,61.3,61.71,62.12,62.5,62.85,63.14,63.37,63.53,63.64,63.71,63.77,63.81,63.86,63.92,63.99,64.07,64.15,64.23,64.3,64.5,64.7,64.9,65.1,65.4,65.7,65.9,66.1,66.3,66.6,66.8,66.0,67.3,67.4,67.6,67.7,67.9,68.0,68.1,68.3,68.4,68.6,68.7,68.8,68.9,69.0 +Moldova,58.5,58.96,59.42,59.85,60.27,60.68,61.07,61.46,61.84,62.22,62.61,62.99,63.38,63.77,64.14,64.48,64.78,65.03,65.23,65.39,65.48,65.55,65.58,65.6,65.6,65.57,65.52,65.47,65.41,65.4,65.48,65.68,65.98,66.38,66.83,67.29,67.69,67.98,68.16,68.2,67.4,67.6,67.4,65.8,65.4,66.1,67.9,68.5,68.4,68.6,69.2,69.6,69.9,70.2,69.5,69.8,70.0,70.4,70.6,70.5,72.3,72.4,73.3,73.6,73.9,74.2 +Mongolia,43.09,43.41,43.83,44.34,44.96,45.66,46.46,47.33,48.25,49.2,50.15,51.08,51.94,52.74,53.48,54.16,54.8,55.43,56.02,56.58,57.08,57.49,57.82,58.06,58.22,58.31,58.36,58.4,58.46,58.56,58.73,59.0,59.34,59.76,60.22,60.71,61.18,61.61,61.98,62.3,62.3,62.2,62.0,62.0,61.7,61.7,61.9,62.1,62.3,62.5,62.7,62.9,63.1,63.4,63.6,64.0,64.4,64.8,65.0,65.2,65.6,66.0,66.4,66.8,67.1,67.4 +Montenegro,59.32,59.59,59.91,60.31,60.78,61.3,61.87,62.5,63.17,63.86,64.54,65.21,65.86,66.47,67.05,67.62,68.19,68.78,69.36,69.94,70.48,70.99,71.41,71.78,72.07,72.33,72.55,72.75,72.95,73.16,73.35,73.52,73.68,73.83,73.96,74.08,74.21,74.35,74.47,74.6,74.4,74.2,73.9,73.7,73.5,73.4,73.3,73.1,73.0,73.3,73.5,74.0,74.5,74.8,75.0,75.2,75.6,76.0,76.3,76.5,76.7,76.8,76.9,77.1,77.2,77.3 +Morocco,45.84,46.21,46.58,46.98,47.39,47.81,48.25,48.7,49.17,49.64,50.11,50.6,51.09,51.58,52.06,52.54,53.0,53.46,53.91,54.34,54.77,55.19,55.62,56.08,56.56,57.11,57.72,58.39,59.13,59.93,60.77,61.63,62.49,63.33,64.14,64.91,65.66,66.38,67.06,67.7,68.1,68.4,68.6,69.1,69.5,70.0,70.4,70.8,71.1,71.5,71.8,72.0,72.3,72.5,72.7,72.9,73.1,73.3,73.5,73.7,73.9,74.1,74.3,74.4,74.6,74.8 +Mozambique,32.26,32.92,33.58,34.25,34.91,35.58,36.23,36.89,37.54,38.17,38.79,39.4,39.98,40.54,41.1,41.66,42.21,42.78,43.37,43.97,44.58,45.21,45.85,46.46,47.06,47.61,48.1,48.52,48.88,49.17,49.4,49.57,49.72,49.87,50.02,50.21,50.45,50.74,51.08,51.5,51.7,52.1,52.3,52.6,52.7,52.6,52.5,52.6,52.6,52.3,52.8,52.7,52.9,53.0,52.9,53.0,53.2,54.0,54.4,54.4,54.5,54.5,54.8,56.1,57.1,58.12 +Myanmar,33.8,35.24,36.53,37.69,38.71,39.6,40.36,41.03,41.65,42.25,42.9,43.64,44.47,45.4,46.4,47.39,48.31,49.11,49.78,50.31,50.72,51.09,51.44,51.78,52.15,52.54,52.93,53.31,53.69,54.07,54.44,54.8,55.16,55.52,55.87,56.23,56.58,56.93,57.26,57.6,57.8,58.1,58.4,58.8,59.0,59.4,59.7,60.1,60.4,60.8,61.3,61.7,62.3,62.8,63.4,64.0,64.6,59.4,65.6,66.0,66.4,66.8,67.2,67.6,68.0,68.4 +Namibia,40.72,41.49,42.23,42.96,43.69,44.39,45.09,45.76,46.42,47.07,47.7,48.31,48.9,49.48,50.05,50.61,51.17,51.71,52.26,52.81,53.36,53.91,54.44,54.98,55.51,56.04,56.56,57.07,57.57,58.06,58.54,59.01,59.45,59.87,60.27,60.65,61.0,61.3,61.54,61.7,61.9,62.0,62.0,61.5,60.5,59.3,58.1,56.7,55.4,54.0,53.4,52.7,52.4,52.5,53.1,54.9,57.5,59.1,60.3,61.4,62.6,63.6,63.9,64.1,64.2,64.3 +Nepal,35.53,36.0,36.48,36.96,37.43,37.9,38.38,38.85,39.32,39.8,40.26,40.74,41.21,41.67,42.14,42.6,43.05,43.51,43.97,44.43,44.91,45.41,45.92,46.47,47.05,47.64,48.28,48.94,49.63,50.32,51.06,51.81,52.57,53.36,54.17,54.98,55.83,56.68,57.53,58.4,59.1,60.0,60.2,61.0,61.7,62.5,63.4,63.9,64.6,65.2,65.9,65.9,66.8,67.0,67.4,67.8,68.1,68.4,68.7,69.0,69.3,69.7,69.9,70.2,69.7,69.2 +Netherlands,71.5,72.12,71.7,72.39,72.51,72.52,72.97,73.13,73.17,73.35,73.54,73.21,73.33,73.71,73.58,73.52,73.79,73.6,73.51,73.57,73.81,73.72,74.17,74.56,74.49,74.61,75.2,75.11,75.59,75.72,75.93,76.01,76.21,76.28,76.34,76.31,76.78,76.98,76.82,77.0,77.2,77.3,77.2,77.5,77.6,77.6,77.9,78.1,78.0,78.1,78.3,78.5,78.7,79.1,79.6,79.9,80.2,80.3,80.6,80.8,80.9,81.0,81.2,81.3,81.3,81.3 +Netherlands Antilles,58.96,60.02,61.0,61.89,62.7,63.43,64.08,64.65,65.15,65.6,65.99,66.34,66.67,67.0,67.33,67.67,68.03,68.41,68.81,69.22,69.63,70.05,70.45,70.84,71.21,71.58,71.94,72.29,72.64,72.96,73.27,73.56,73.8,74.02,74.19,74.33,74.42,74.49,74.52,74.54,74.53,74.52,74.5,74.49,74.48,74.48,74.5,74.53,74.57,74.65,74.76,74.91,75.09,75.3,75.53,75.76,75.98,76.18,76.36,76.52,76.65,76.77,76.89,77.01,77.14,77.27 +New Caledonia,49.51,50.34,51.16,51.96,52.74,53.5,54.25,54.98,55.69,56.38,57.06,57.72,58.36,58.99,59.6,60.19,60.77,61.33,61.89,62.42,62.95,63.46,63.95,64.44,64.91,65.37,65.81,66.25,66.68,67.09,67.5,67.89,68.28,68.65,69.02,69.37,69.72,70.05,70.38,70.7,71.01,71.31,71.6,71.89,72.16,72.43,72.7,72.95,73.21,73.46,73.7,73.94,74.17,74.4,74.62,74.84,75.05,75.26,75.47,75.67,75.88,76.09,76.31,76.52,76.74,76.96 +New Zealand,69.17,69.4,70.25,70.36,70.49,70.75,70.27,70.9,70.82,71.28,71.0,71.26,71.33,71.37,71.3,71.16,71.54,71.2,71.57,71.35,71.8,71.92,71.78,72.03,72.3,72.5,72.25,73.14,73.18,72.98,73.77,73.87,73.97,74.53,74.03,74.28,74.36,74.64,75.05,75.6,75.9,76.2,76.5,76.7,77.0,77.3,77.6,78.0,78.2,78.4,78.6,78.9,79.1,79.4,79.8,79.9,80.1,80.3,80.5,80.8,80.8,81.1,81.4,81.4,81.4,81.4 +Nicaragua,43.38,44.18,44.98,45.78,46.59,47.4,48.22,49.04,49.86,50.69,51.53,52.36,53.19,54.04,54.88,55.74,56.6,57.47,58.33,59.18,60.01,60.8,61.56,62.28,62.95,63.59,64.17,64.73,65.28,65.83,66.38,66.95,67.56,68.2,68.89,69.67,70.51,71.4,72.35,73.3,73.7,73.6,73.9,74.1,74.4,74.7,75.0,73.2,75.6,76.0,76.2,76.3,76.3,76.4,76.6,76.7,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.8,78.0,78.2 +Niger,35.61,35.72,35.83,35.95,36.08,36.22,36.37,36.51,36.67,36.82,36.97,37.1,37.24,37.36,37.49,37.61,37.73,37.88,38.05,38.24,38.45,38.69,38.95,39.25,39.57,39.97,40.4,40.9,41.44,42.0,42.58,43.13,43.66,44.15,44.63,45.09,45.57,46.07,46.62,47.2,47.9,48.2,48.6,49.1,49.5,50.2,50.6,51.2,51.8,52.4,52.9,53.7,54.4,55.2,55.9,56.6,57.3,58.0,58.6,59.2,59.6,60.0,60.4,60.7,61.0,61.3 +Nigeria,35.25,35.74,36.25,36.79,37.35,37.93,38.53,39.14,39.76,40.39,41.0,41.61,42.19,42.75,43.29,43.81,38.31,33.47,31.63,41.79,46.56,47.16,47.77,48.38,49.0,49.62,50.24,50.84,51.42,51.95,52.41,52.8,53.12,53.36,53.54,53.67,53.78,53.88,53.98,54.1,54.3,54.4,54.5,54.9,55.0,55.0,55.0,55.1,55.2,55.2,55.4,55.3,55.6,56.1,56.8,57.4,58.3,59.2,60.3,61.2,62.0,62.6,63.3,63.7,64.6,65.51 +North Korea,26.78,24.76,31.74,42.66,46.7,48.18,49.16,49.73,50.43,50.9,51.25,51.64,52.15,52.86,53.76,54.84,55.97,57.07,58.1,59.06,59.93,60.74,61.5,62.22,62.88,63.49,64.04,64.53,64.98,65.39,65.75,66.08,66.4,66.69,67.0,67.36,67.78,68.22,68.63,68.9,69.2,69.4,69.6,69.7,58.6,58.7,58.8,58.9,59.0,59.1,59.2,59.3,69.9,70.0,70.2,70.4,70.6,70.9,71.0,71.2,71.4,71.6,71.8,71.9,72.1,72.3 +Norway,72.58,72.72,73.2,73.28,73.5,73.55,73.5,73.5,73.63,73.66,73.67,73.55,73.2,73.7,73.83,74.11,74.18,74.07,73.78,74.19,74.3,74.46,74.56,74.88,74.93,75.17,75.51,75.54,75.54,75.8,76.0,76.13,76.19,76.36,76.07,76.21,76.07,76.17,76.52,76.6,77.0,77.1,77.5,77.7,77.9,78.2,78.3,78.3,78.5,78.6,78.9,79.1,79.5,79.8,80.2,80.4,80.6,80.8,80.8,81.1,81.1,81.6,81.6,82.0,82.0,82.0 +Oman,35.74,36.78,37.81,38.82,39.82,40.8,41.78,42.75,43.7,44.64,45.57,46.47,47.37,48.26,49.13,49.97,50.8,51.62,52.43,53.26,54.14,55.07,56.06,57.11,58.2,59.32,60.45,61.57,62.65,63.7,64.69,65.65,66.59,67.48,68.35,69.17,69.95,70.7,71.41,72.1,72.5,72.9,73.3,73.6,73.9,74.2,74.5,74.8,75.1,75.2,75.4,75.4,75.6,75.8,76.0,76.0,76.0,76.2,76.2,76.1,76.3,76.6,76.8,77.0,77.2,77.4 +Pakistan,36.85,38.07,39.26,40.42,41.56,42.67,43.75,44.8,45.81,46.79,47.73,48.63,49.47,50.27,51.01,51.7,52.34,52.95,53.52,54.06,54.6,55.12,55.64,56.16,56.68,57.17,57.63,58.05,58.44,58.79,59.13,59.45,59.77,60.09,60.43,60.77,61.11,61.45,61.78,62.1,62.2,62.1,62.0,61.9,61.8,61.9,61.8,62.0,62.1,62.3,62.5,62.6,62.8,63.1,62.2,63.7,63.8,64.1,64.3,64.5,64.9,65.1,65.4,65.6,65.9,66.2 +Panama,56.42,56.99,57.56,58.14,58.72,59.31,59.89,60.47,61.05,61.62,62.17,62.71,63.22,63.72,64.21,64.7,65.18,65.65,66.15,66.66,67.18,67.72,68.26,68.81,69.35,69.88,70.38,70.85,71.3,71.72,72.1,72.47,72.8,73.13,73.45,73.76,74.06,74.34,74.62,74.9,75.0,75.0,75.2,75.2,75.3,75.4,75.6,75.8,76.2,76.5,76.7,76.9,77.0,77.1,77.2,77.2,77.3,77.3,77.3,77.3,77.4,77.5,77.6,77.9,78.2,78.5 +Papua New Guinea,34.02,34.53,35.04,35.54,36.03,36.53,37.02,37.51,38.04,38.6,39.2,39.87,40.6,41.39,42.22,43.07,43.92,44.74,45.53,46.27,46.97,47.63,48.27,48.9,49.54,50.21,50.91,51.65,52.4,53.11,53.74,54.26,54.65,54.92,55.08,55.19,55.3,55.47,55.7,56.0,56.0,56.2,56.4,56.7,56.9,57.0,57.2,56.5,57.4,57.5,57.6,57.6,57.7,57.7,57.9,58.0,58.2,58.6,58.8,59.1,59.4,59.7,60.2,60.5,60.9,61.3 +Paraguay,64.04,64.16,64.33,64.52,64.76,65.03,65.33,65.65,66.0,66.35,66.7,67.03,67.33,67.61,67.87,68.11,68.37,68.63,68.9,69.2,69.49,69.78,70.06,70.32,70.57,70.81,71.04,71.28,71.51,71.73,71.97,72.19,72.41,72.64,72.87,73.11,73.36,73.62,73.91,74.2,74.2,74.1,74.1,74.0,74.1,74.1,74.2,74.2,74.3,74.2,74.2,74.1,74.1,73.8,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.1,74.1,74.3,74.4,74.5 +Peru,43.99,44.43,44.91,45.41,45.95,46.51,47.1,47.72,48.34,48.95,49.56,50.14,50.7,51.25,51.79,52.38,53.03,53.74,54.52,55.36,56.2,57.04,57.85,58.6,59.31,59.99,60.63,61.28,61.93,62.59,63.25,63.9,64.55,65.18,65.8,66.41,66.99,67.57,68.14,68.7,69.2,69.5,70.0,70.5,71.1,71.7,72.4,73.1,73.9,74.6,75.2,75.7,76.2,76.7,77.2,77.7,77.9,78.2,78.2,78.4,78.5,78.7,79.1,79.3,79.5,79.7 +Philippines,55.43,55.83,56.23,56.61,56.99,57.36,57.74,58.11,58.46,58.82,59.17,59.53,59.87,60.21,60.56,60.91,61.26,61.6,61.94,62.26,62.54,62.77,62.95,63.1,63.21,63.32,63.44,63.6,63.81,64.06,64.37,64.74,65.13,65.53,65.95,66.35,66.72,67.05,67.34,67.6,67.9,68.2,68.3,68.6,68.8,68.9,69.0,69.0,69.2,69.1,69.0,69.0,69.1,69.1,69.1,69.2,69.7,69.8,69.9,70.1,70.2,70.3,70.3,70.7,71.0,71.3 +Poland,59.68,60.87,61.96,62.97,63.9,64.74,65.5,65.97,65.59,67.92,68.04,67.71,68.64,68.87,69.58,69.99,69.69,70.33,69.83,69.96,69.76,70.95,70.95,71.46,70.88,70.88,70.78,70.71,71.05,70.4,71.38,71.45,71.29,71.03,70.78,71.07,71.12,71.49,71.25,70.9,70.7,71.1,71.7,71.7,71.9,72.4,72.7,73.0,73.1,73.8,74.2,74.6,74.9,75.0,75.1,75.2,75.2,75.4,75.7,76.2,76.5,76.7,77.3,77.4,77.6,77.8 +Portugal,58.71,59.81,61.11,62.25,61.42,61.22,61.49,63.79,62.97,64.23,62.85,64.37,65.0,65.22,66.17,65.67,66.57,66.88,66.49,67.14,66.91,69.23,68.63,69.18,68.9,69.12,70.37,70.83,71.64,71.71,71.9,72.73,72.65,72.94,73.22,73.61,74.0,74.02,74.58,74.2,74.2,74.6,74.7,75.5,75.5,75.5,75.8,76.1,76.4,76.8,76.8,77.3,77.6,78.2,78.4,79.0,79.2,79.4,79.6,79.9,80.2,80.4,80.7,80.7,80.8,80.9 +Puerto Rico,61.57,62.94,64.16,65.22,66.13,66.87,67.48,67.94,68.31,68.58,68.8,69.0,69.21,69.45,69.72,70.03,70.36,70.7,71.03,71.35,71.66,71.98,72.26,72.53,72.77,72.98,73.15,73.28,73.38,73.47,73.56,73.65,73.76,73.89,74.0,74.07,74.08,74.04,73.93,73.8,73.8,73.7,73.8,73.1,73.3,73.6,74.5,75.1,75.2,75.6,75.8,76.2,76.5,76.5,76.6,76.8,76.9,77.0,77.1,77.1,77.4,77.7,77.9,78.2,78.5,78.8 +Qatar,53.86,54.67,55.47,56.26,57.04,57.81,58.58,59.33,60.08,60.82,61.57,62.31,63.06,63.79,64.53,65.25,65.95,66.64,67.29,67.91,68.49,69.03,69.52,69.98,70.4,70.79,71.15,71.49,71.83,72.14,72.45,72.75,73.01,73.27,73.51,73.74,73.94,74.14,74.32,74.5,74.4,74.5,74.5,74.4,74.4,74.5,74.6,74.6,74.6,74.7,75.0,75.0,75.2,75.8,76.3,76.7,77.3,77.9,78.5,79.2,79.7,79.9,79.9,79.8,79.7,79.6 +Reunion,45.98,47.28,48.53,49.72,50.86,51.94,52.96,53.93,54.85,55.73,56.57,57.37,58.15,58.9,59.64,60.36,61.06,61.74,62.41,63.06,63.69,64.3,64.89,65.46,66.0,66.53,67.05,67.55,68.03,68.51,68.97,69.43,69.87,70.3,70.73,71.14,71.54,71.94,72.32,72.69,73.06,73.41,73.77,74.11,74.45,74.79,75.12,75.44,75.76,76.08,76.38,76.68,76.97,77.26,77.53,77.81,78.08,78.35,78.62,78.88,79.14,79.4,79.65,79.89,80.12,80.35 +Romania,61.13,61.07,61.19,61.47,61.93,62.54,63.29,64.14,65.04,65.92,66.7,67.32,67.74,67.96,68.02,67.98,67.95,68.01,68.16,68.41,68.73,69.06,69.34,69.58,69.75,69.87,69.95,70.01,70.06,70.1,70.12,70.11,70.1,70.08,70.05,70.02,70.0,69.98,69.99,70.0,70.5,70.0,69.8,69.5,69.4,69.1,69.1,69.8,70.6,71.1,71.1,71.2,71.6,72.0,72.4,72.8,73.3,73.2,73.3,73.7,74.5,74.7,74.9,75.1,75.2,75.3 +Russia,57.76,58.16,58.96,60.96,63.35,64.85,63.95,66.84,67.59,68.61,68.85,68.51,68.98,69.77,69.36,69.43,69.21,69.17,68.65,68.76,69.02,68.92,68.89,68.88,68.24,67.98,67.85,67.89,67.61,67.57,67.79,68.25,68.01,67.53,68.19,69.8,69.81,69.66,69.57,69.2,69.1,68.0,65.2,63.8,64.4,65.7,67.0,67.2,65.9,65.1,65.1,64.9,64.7,65.1,65.1,66.7,67.7,67.9,68.8,68.9,69.8,70.4,70.8,70.9,71.0,71.1 +Rwanda,39.99,40.32,40.66,41.0,41.34,41.69,42.03,42.38,42.73,43.07,43.41,43.74,44.05,44.35,44.62,44.85,45.07,45.27,45.44,45.58,45.71,45.81,45.91,46.01,46.13,46.31,46.54,46.81,47.12,47.46,47.88,48.32,48.69,48.88,49.15,49.42,49.69,49.96,50.23,50.5,49.3,48.0,46.7,13.2,43.8,44.6,44.0,45.6,47.2,49.2,51.0,53.5,55.5,57.6,59.6,61.6,63.1,64.1,64.3,65.1,65.3,65.5,65.6,65.7,65.9,66.1 +Samoa,46.08,46.69,47.3,47.9,48.5,49.09,49.69,50.28,50.87,51.45,52.04,52.62,53.21,53.8,54.39,54.98,55.57,56.15,56.75,57.33,57.92,58.5,59.09,59.67,60.26,60.84,61.44,62.02,62.62,63.2,63.79,64.36,64.94,65.51,66.1,66.67,67.27,67.87,68.49,69.1,69.1,69.5,69.7,69.8,70.0,70.2,70.4,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,72.0,72.1,72.3,70.4,72.6,72.7,72.7,73.0,73.1,73.2,73.3 +Sao Tome and Principe,46.1,46.54,47.01,47.52,48.05,48.6,49.18,49.77,50.38,51.01,51.62,52.21,52.79,53.36,53.92,54.47,55.03,55.6,56.19,56.81,57.47,58.13,58.81,59.47,60.09,60.63,61.08,61.42,61.67,61.83,61.93,62.02,62.12,62.24,62.4,62.59,62.79,63.0,63.2,63.4,63.5,63.6,63.7,64.0,64.1,63.9,63.9,64.0,64.4,64.6,64.9,65.0,65.3,65.4,65.5,65.7,65.7,66.0,66.7,66.9,67.2,67.4,67.6,67.8,68.0,68.2 +Saudi Arabia,42.31,42.89,43.47,44.05,44.64,45.23,45.82,46.42,47.02,47.62,48.22,48.84,49.48,50.15,50.88,51.68,52.55,53.51,54.55,55.65,56.82,58.04,59.26,60.48,61.67,62.83,63.95,65.01,66.03,66.99,67.89,68.74,69.54,70.3,71.01,71.66,72.28,72.85,73.39,73.9,74.3,74.6,74.9,75.1,75.5,75.8,76.0,76.3,76.6,76.8,77.1,77.2,77.4,77.5,77.8,77.9,78.2,78.3,78.5,78.7,78.9,79.2,79.3,79.4,79.5,79.6 +Senegal,34.89,35.39,35.88,36.34,36.78,37.19,37.57,37.93,38.23,38.46,38.63,38.72,38.76,38.74,38.71,38.7,38.74,38.9,39.17,39.59,40.18,40.94,41.85,42.85,43.94,45.07,46.21,47.33,48.4,49.42,50.43,51.44,52.47,53.48,54.45,55.36,56.17,56.86,57.41,57.8,58.0,58.0,58.2,58.2,58.4,58.8,58.9,59.1,59.2,59.7,60.2,60.4,61.3,61.7,62.2,62.5,63.0,63.5,63.9,64.2,64.4,64.6,64.8,65.0,65.3,65.6 +Serbia,58.63,59.11,59.61,60.12,60.63,61.15,61.69,62.23,62.78,63.33,63.88,64.44,64.99,65.53,66.06,66.56,67.05,67.51,67.94,68.34,68.7,69.03,69.32,69.59,69.82,70.03,70.21,70.37,70.53,70.68,70.84,71.01,71.19,71.38,71.58,71.78,71.99,72.17,72.35,72.5,71.4,72.4,72.3,72.1,72.0,71.9,72.1,71.5,71.0,72.1,72.4,72.5,72.7,72.9,73.2,73.6,74.0,74.3,74.6,74.8,75.1,75.4,75.7,75.9,76.2,76.5 +Seychelles,57.55,57.43,57.45,57.57,57.82,58.18,58.65,59.19,59.8,60.42,61.03,61.59,62.08,62.47,62.81,63.11,63.43,63.78,64.18,64.62,65.11,65.59,66.06,66.49,66.9,67.26,67.59,67.89,68.16,68.4,68.63,68.83,69.02,69.17,69.3,69.36,69.37,69.31,69.22,69.1,69.1,69.2,69.3,69.6,69.8,69.9,70.1,70.4,70.7,70.9,71.1,71.3,71.5,71.7,72.0,72.3,72.6,72.9,73.0,73.1,73.4,73.7,73.8,74.0,74.1,74.2 +Sierra Leone,31.66,32.13,32.62,33.1,33.6,34.09,34.59,35.08,35.58,36.07,36.57,37.06,37.57,38.1,38.7,39.38,40.18,41.08,42.08,43.15,44.28,45.39,46.48,47.5,48.45,49.31,50.11,50.83,51.49,52.04,52.5,52.83,53.06,53.16,53.14,52.98,52.72,52.36,51.98,51.6,51.4,51.9,52.1,51.6,50.9,51.9,51.3,49.7,49.2,51.5,51.8,51.6,51.7,52.0,52.3,52.7,53.0,53.6,54.2,55.0,55.6,56.4,57.1,55.2,57.1,59.07 +Singapore,58.62,59.54,60.41,61.24,62.01,62.73,63.39,64.01,64.54,65.02,65.41,65.72,65.97,66.16,66.31,66.46,66.63,66.84,67.09,67.4,67.75,68.12,68.5,68.88,69.26,69.62,69.98,70.34,70.68,71.04,71.39,71.74,72.11,72.49,72.87,73.27,73.68,74.1,74.51,74.9,75.6,76.0,76.2,76.3,76.4,76.7,77.2,77.6,78.0,78.3,78.6,78.9,79.3,79.8,80.0,80.2,80.4,80.6,81.0,81.3,81.5,81.6,81.7,81.9,82.0,82.1 +Slovak Republic,61.35,64.4,65.7,66.76,67.89,68.42,67.51,69.41,69.09,70.42,70.86,70.4,70.79,71.17,70.39,70.53,71.07,70.6,69.91,69.84,69.99,70.46,70.16,70.33,70.45,70.62,70.58,70.59,70.92,70.58,70.82,70.94,70.64,70.88,70.89,71.07,71.24,71.32,71.12,71.0,71.1,71.4,71.9,72.3,72.4,72.8,72.8,72.8,73.0,73.3,73.6,73.8,73.9,74.2,74.3,74.5,74.6,74.9,75.2,75.7,76.1,76.5,77.0,77.4,77.6,77.8 +Slovenia,64.71,65.28,65.83,66.34,66.81,67.25,67.66,68.02,68.34,68.62,68.82,68.98,69.08,69.12,69.14,69.14,69.14,69.17,69.23,69.32,69.47,69.66,69.86,70.09,70.32,70.51,70.66,70.77,70.85,70.89,70.94,71.03,70.74,71.2,71.63,72.17,72.1,72.75,73.19,73.7,73.6,73.8,73.9,74.2,74.6,75.0,75.2,75.4,75.7,76.1,76.3,76.6,76.8,77.2,77.6,77.9,78.2,78.7,79.1,79.5,79.9,80.1,80.3,80.8,80.9,81.0 +Solomon Islands,45.39,45.97,46.53,47.11,47.68,48.26,48.83,49.41,49.98,50.55,51.12,51.69,52.27,52.84,53.42,54.0,54.58,55.16,55.74,56.31,56.91,57.52,58.13,58.74,59.33,59.9,60.43,60.89,61.27,61.53,61.59,61.46,61.16,60.74,60.26,59.84,59.58,59.52,59.7,60.1,60.0,60.4,60.6,60.9,61.1,61.4,61.5,61.6,61.7,61.7,61.7,61.7,61.7,61.7,61.8,61.9,61.9,62.3,62.4,62.7,63.0,63.3,63.5,63.6,64.0,64.4 +Somalia,34.13,34.6,35.07,35.54,36.01,36.47,36.94,37.41,37.87,38.34,38.8,39.26,39.74,40.21,40.68,41.14,41.61,42.08,42.54,42.99,43.44,43.9,44.35,44.8,45.24,45.7,46.15,46.6,47.03,47.46,47.88,48.28,48.65,48.98,49.24,49.36,49.34,49.19,48.98,48.8,47.4,48.4,49.7,49.7,49.9,49.9,49.6,50.3,50.4,50.7,50.9,51.1,51.5,51.6,52.1,52.2,52.4,52.6,52.8,51.6,52.0,53.4,54.1,54.3,54.2,54.1 +South Africa,43.92,44.67,45.37,46.03,46.63,47.19,47.71,48.17,48.6,49.01,49.4,49.78,50.14,50.52,50.91,51.3,51.68,52.04,52.41,52.77,53.11,53.44,53.77,54.11,54.47,54.86,55.3,55.77,56.29,56.85,57.44,58.04,58.64,59.22,59.78,60.32,60.83,61.29,61.69,62.0,62.5,62.4,63.0,62.8,62.7,61.6,60.0,58.9,57.9,56.4,55.9,54.8,53.7,52.8,52.7,52.5,53.0,53.4,53.9,54.9,56.6,59.0,60.7,61.2,61.3,61.4 +South Korea,40.52,40.02,45.02,48.02,49.55,50.22,50.9,51.6,52.3,53.02,53.75,54.51,55.27,56.04,56.84,57.67,58.54,59.44,60.35,61.22,62.02,62.73,63.34,63.84,64.26,64.62,64.95,65.31,65.7,66.15,66.66,67.21,67.78,68.37,68.98,69.58,70.18,70.75,71.29,71.8,72.2,72.7,73.1,73.6,74.0,74.5,74.9,75.4,75.8,76.3,76.7,77.1,77.7,78.2,78.7,79.1,79.4,79.8,80.1,80.4,80.6,80.7,80.9,80.9,81.0,81.1 +South Sudan,28.6,29.37,30.11,30.82,31.51,32.17,32.81,33.42,34.02,34.61,35.18,35.75,36.32,36.9,37.48,38.04,38.6,39.15,39.68,40.21,40.75,41.29,41.84,42.39,42.93,43.43,43.87,44.26,44.61,44.93,45.25,45.6,46.01,46.5,47.06,47.72,48.45,49.23,50.05,50.9,51.0,51.6,51.9,52.3,52.7,53.1,53.4,53.8,54.1,54.4,54.7,54.9,55.0,55.2,55.3,55.4,55.5,55.6,55.8,56.0,55.9,56.0,56.0,56.1,56.1,56.1 +Spain,61.5,64.92,65.79,66.98,66.75,66.79,66.63,68.82,68.74,69.23,69.62,69.65,69.81,70.54,70.95,71.2,71.39,71.68,71.21,72.19,71.79,73.0,72.78,73.16,73.49,73.81,74.32,74.51,75.05,75.53,75.67,76.22,76.0,76.38,76.34,76.59,76.82,76.82,76.89,76.9,77.0,77.4,77.6,77.8,77.9,78.1,78.6,78.8,78.8,79.2,79.5,79.6,79.6,80.0,80.3,80.7,80.8,81.1,81.5,81.8,82.0,82.2,82.5,82.5,82.6,82.7 +Sri Lanka,53.25,54.34,55.32,56.22,57.01,57.71,58.32,58.86,59.32,59.76,60.18,60.61,61.06,61.55,62.07,62.62,63.17,63.7,64.21,64.69,65.15,65.56,65.97,66.36,66.76,67.17,67.6,68.06,68.52,68.97,69.35,69.64,69.83,69.93,69.97,70.0,70.05,70.16,70.32,70.5,71.3,72.0,72.9,72.8,71.7,71.3,71.4,72.0,72.4,72.4,73.3,73.7,74.0,69.4,73.9,73.9,74.4,74.0,74.1,75.0,76.4,76.8,77.1,77.4,77.6,77.8 +St. Lucia,51.89,52.09,52.4,52.81,53.32,53.92,54.6,55.36,56.15,56.97,57.75,58.47,59.11,59.66,60.15,60.58,61.0,61.45,61.94,62.47,63.04,63.63,64.22,64.8,65.38,65.96,66.54,67.1,67.64,68.15,68.6,68.99,69.29,69.53,69.7,69.83,69.93,70.03,70.12,70.2,70.4,70.5,70.7,70.9,71.1,71.2,71.5,71.7,71.8,72.0,72.1,72.3,72.5,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.7,74.7,74.8,74.8,74.8 +St. Vincent and the Grenadines,50.11,50.59,51.19,51.89,52.69,53.58,54.57,55.63,56.73,57.85,58.96,59.99,60.93,61.75,62.46,63.06,63.58,64.04,64.46,64.84,65.16,65.43,65.64,65.82,65.99,66.16,66.36,66.61,66.88,67.19,67.52,67.84,68.15,68.43,68.68,68.92,69.14,69.34,69.53,69.7,69.7,69.7,69.7,69.6,69.6,69.4,69.7,69.8,69.6,69.1,69.7,69.7,70.1,70.2,70.4,70.6,70.8,70.9,71.1,71.1,71.0,71.1,70.8,71.1,71.2,71.3 +Sudan,44.44,45.08,45.71,46.31,46.88,47.45,48.0,48.53,49.04,49.54,50.04,50.52,50.99,51.47,51.94,52.42,52.9,53.36,53.82,54.26,54.68,55.06,55.41,55.73,56.0,56.23,56.44,56.63,56.8,56.95,57.11,57.27,57.44,57.61,57.81,58.01,58.23,58.44,58.67,58.9,59.2,59.4,59.5,60.2,60.5,60.6,60.8,61.2,62.0,62.4,62.8,63.3,63.5,63.7,64.6,64.9,65.3,65.5,65.7,66.1,66.3,66.7,66.9,67.2,67.5,67.8 +Suriname,55.52,56.24,56.93,57.57,58.16,58.72,59.24,59.71,60.16,60.58,61.0,61.41,61.81,62.23,62.65,63.07,63.49,63.89,64.28,64.66,65.0,65.32,65.62,65.91,66.19,66.47,66.76,67.07,67.39,67.71,68.02,68.31,68.57,68.79,68.98,69.15,69.29,69.43,69.57,69.7,69.9,69.8,69.7,69.8,70.1,70.2,70.2,70.1,69.9,69.7,69.5,69.4,69.5,69.7,69.9,70.0,70.1,70.2,70.5,70.7,71.0,71.3,71.6,71.8,72.0,72.2 +Swaziland,41.01,41.51,41.98,42.44,42.88,43.3,43.7,44.08,44.44,44.78,45.1,45.42,45.73,46.05,46.39,46.76,47.2,47.67,48.21,48.79,49.4,50.03,50.67,51.3,51.94,52.58,53.24,53.92,54.62,55.31,56.02,56.71,57.38,58.0,58.6,59.15,59.67,60.13,60.5,60.7,60.7,61.0,61.3,60.7,59.1,57.1,55.8,53.5,51.4,48.8,46.6,45.1,44.0,43.0,42.5,43.1,44.3,45.1,45.9,46.4,48.0,49.1,49.4,49.8,51.8,53.88 +Sweden,71.35,71.84,71.88,72.34,72.58,72.64,72.47,73.11,73.34,73.01,73.47,73.34,73.53,73.7,73.85,74.09,74.12,73.99,74.11,74.66,74.58,74.68,74.83,74.94,74.95,74.96,75.39,75.48,75.52,75.74,76.04,76.36,76.6,76.86,76.72,76.98,77.12,77.01,77.67,77.6,77.7,78.1,78.3,78.5,78.9,79.1,79.4,79.5,79.5,79.7,79.8,80.0,80.2,80.2,80.6,80.8,80.9,81.1,81.2,81.6,81.7,81.8,81.9,82.1,82.1,82.1 +Switzerland,68.72,69.63,69.55,70.02,70.1,70.23,70.58,71.32,71.48,71.46,71.79,71.35,71.34,72.23,72.36,72.5,72.8,72.75,72.76,73.18,73.3,73.82,74.12,74.47,74.86,74.98,75.43,75.39,75.69,75.69,75.92,76.26,76.27,76.87,76.99,77.17,77.47,77.49,77.68,77.5,77.6,77.9,78.3,78.4,78.5,79.1,79.2,79.5,79.8,79.8,80.2,80.4,80.6,81.0,81.3,81.5,81.7,82.0,82.0,82.3,82.6,82.7,82.8,82.9,83.0,83.1 +Syria,47.87,48.44,49.02,49.59,50.15,50.7,51.25,51.79,52.33,52.87,53.43,53.98,54.56,55.15,55.77,56.42,57.12,57.83,58.57,59.31,60.08,60.82,61.56,62.26,62.95,63.6,64.24,64.84,65.44,66.01,66.56,67.08,67.58,68.05,68.51,68.94,69.35,69.75,70.14,70.5,71.0,71.8,72.0,72.3,72.7,73.1,73.4,73.8,74.1,74.4,74.6,74.9,75.1,75.3,75.5,75.7,75.9,76.1,76.3,76.5,75.1,68.1,69.0,67.2,68.2,69.21 +Taiwan,55.11,58.51,60.31,62.01,62.41,62.51,62.41,64.21,64.22,64.42,64.92,65.22,66.02,66.72,67.42,67.42,67.52,67.62,68.62,68.67,69.08,69.38,69.43,69.8,70.05,70.41,70.58,71.15,71.28,71.53,71.63,72.14,72.12,72.79,72.98,73.11,73.4,73.22,73.53,73.8,74.2,74.3,74.5,74.6,74.6,74.7,75.2,75.4,75.3,76.0,76.4,76.9,77.3,77.3,77.4,77.8,78.2,78.4,78.7,79.0,78.8,79.0,79.3,79.4,79.5,79.6 +Tajikistan,52.94,53.4,53.87,54.33,54.79,55.26,55.72,56.17,56.64,57.1,57.57,58.03,58.51,58.98,59.45,59.9,60.34,60.77,61.17,61.55,61.9,62.23,62.53,62.81,63.08,63.34,63.57,63.81,64.04,64.28,64.53,64.8,65.07,65.34,65.55,65.69,65.73,65.67,65.5,65.3,65.3,62.6,64.2,64.1,64.1,63.3,64.8,64.9,65.5,65.8,66.1,66.5,66.9,67.5,68.0,68.7,69.2,69.6,70.0,70.1,70.1,70.8,71.4,71.9,72.4,72.9 +Tanzania,41.66,42.19,42.69,43.18,43.63,44.05,44.46,44.84,45.22,45.57,45.91,46.26,46.62,46.99,47.37,47.77,48.19,48.62,49.07,49.53,50.03,50.55,51.09,51.65,52.19,52.71,53.19,53.61,53.98,54.29,54.56,54.82,55.05,55.26,55.44,55.54,55.58,55.51,55.39,55.2,55.1,54.7,54.5,54.0,53.9,53.8,53.8,53.7,53.8,54.3,54.8,55.4,55.9,56.5,57.1,57.9,59.1,60.4,60.8,61.4,61.7,61.9,62.7,63.3,64.1,64.91 +Thailand,51.14,51.5,51.9,52.32,52.78,53.28,53.8,54.35,54.91,55.46,56.01,56.51,56.98,57.4,57.8,58.18,58.56,58.96,59.39,59.86,60.33,60.82,61.29,61.77,62.24,62.7,63.15,63.62,64.1,64.62,65.22,65.91,66.69,67.52,68.36,69.15,69.84,70.38,70.76,71.0,71.0,70.9,70.8,70.6,70.6,70.6,70.5,70.4,70.5,70.7,71.2,71.7,72.1,72.2,73.1,73.5,73.8,73.9,74.0,74.2,74.3,74.4,74.4,74.6,74.7,74.8 +Timor-Leste,31.41,32.12,32.83,33.54,34.24,34.94,35.64,36.34,37.04,37.74,38.45,39.15,39.85,40.55,41.29,42.12,43.02,43.96,44.86,45.56,45.8,45.51,44.71,43.49,42.12,40.94,40.25,40.27,41.01,42.45,44.42,46.61,48.76,50.76,52.51,53.99,55.26,56.41,57.47,58.5,59.2,59.9,60.6,61.3,61.8,62.3,62.4,62.8,62.3,60.7,64.4,65.3,65.7,66.5,67.5,68.5,69.2,69.9,70.4,70.8,71.3,71.7,72.0,72.3,72.4,72.5 +Togo,34.69,35.42,36.15,36.86,37.57,38.28,38.98,39.68,40.38,41.06,41.74,42.42,43.1,43.77,44.43,45.09,45.75,46.41,47.07,47.72,48.36,49.0,49.63,50.26,50.88,51.49,52.09,52.7,53.29,53.87,54.43,54.97,55.48,55.96,56.39,56.79,57.14,57.42,57.65,57.8,57.8,57.9,57.8,57.6,57.6,57.3,56.9,56.6,56.8,56.7,56.7,56.7,56.4,56.8,56.8,57.5,57.5,57.5,58.0,58.7,59.6,60.3,60.7,61.1,61.5,61.9 +Tonga,58.0,58.35,58.7,59.05,59.41,59.77,60.12,60.48,60.84,61.2,61.56,61.91,62.26,62.6,62.94,63.27,63.61,63.93,64.26,64.58,64.88,65.17,65.44,65.69,65.93,66.16,66.39,66.61,66.84,67.08,67.32,67.56,67.8,68.04,68.27,68.48,68.67,68.83,68.98,69.1,69.3,69.4,69.5,69.5,69.6,69.7,69.7,69.7,69.6,69.6,69.6,69.7,69.6,69.8,70.0,70.1,70.2,70.3,68.6,70.7,70.8,71.0,71.2,71.3,71.5,71.7 +Trinidad and Tobago,57.36,57.85,58.39,58.98,59.61,60.27,60.97,61.68,62.38,63.07,63.68,64.2,64.63,64.94,65.17,65.31,65.41,65.5,65.6,65.73,65.91,66.11,66.33,66.57,66.83,67.08,67.33,67.54,67.73,67.89,68.03,68.16,68.28,68.39,68.5,68.62,68.74,68.87,68.98,69.1,69.3,69.2,69.3,69.2,69.3,69.3,69.4,69.6,69.3,69.5,69.8,69.9,70.4,70.9,71.1,71.3,71.5,71.7,71.8,71.8,71.9,72.0,72.1,72.3,72.4,72.5 +Tunisia,39.03,39.33,39.68,40.06,40.48,40.94,41.43,41.97,42.56,43.2,43.89,44.65,45.47,46.35,47.31,48.33,49.42,50.56,51.74,52.94,54.16,55.37,56.57,57.75,58.9,60.03,61.15,62.27,63.36,64.41,65.4,66.31,67.13,67.87,68.56,69.2,69.83,70.48,71.13,71.8,72.0,72.2,72.2,72.5,72.9,73.4,73.9,74.3,74.7,75.0,75.3,75.5,75.7,76.0,76.2,76.4,76.6,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.6,77.6 +Turkey,41.2,41.68,42.2,42.76,43.35,43.99,44.67,45.38,46.13,46.91,47.71,48.52,49.35,50.15,50.96,51.74,52.48,53.21,53.91,54.59,55.27,55.96,56.65,57.36,58.08,58.81,59.55,60.29,61.03,61.74,62.45,63.15,63.82,64.49,65.15,65.76,66.37,66.96,67.53,68.1,68.5,69.2,69.7,69.8,70.0,70.6,71.2,72.0,71.5,73.8,74.4,75.1,75.1,75.8,76.2,76.7,77.4,77.8,78.5,78.8,78.8,79.1,78.8,79.1,79.2,79.3 +Turkmenistan,50.89,51.34,51.79,52.25,52.69,53.14,53.58,54.03,54.47,54.91,55.36,55.82,56.27,56.72,57.17,57.61,58.02,58.42,58.8,59.15,59.46,59.74,60.01,60.25,60.49,60.73,61.0,61.28,61.58,61.9,62.24,62.57,62.89,63.18,63.44,63.63,63.77,63.85,63.89,63.9,63.5,63.5,63.5,63.4,63.3,63.2,63.2,63.3,63.5,63.7,64.1,64.4,64.8,65.3,65.8,66.3,66.8,67.2,67.6,68.1,68.5,68.9,69.2,69.6,70.0,70.4 +Uganda,39.94,40.51,41.08,41.65,42.24,42.82,43.42,44.03,44.64,45.27,45.91,46.56,47.22,47.86,48.49,49.07,49.58,50.05,50.43,50.74,50.99,51.17,51.33,51.45,51.55,51.65,51.75,51.83,51.93,52.01,52.09,52.14,52.17,52.16,52.09,51.94,51.72,51.42,51.08,50.7,50.0,49.6,49.0,48.5,48.3,48.2,48.5,48.7,48.9,49.1,49.7,50.3,51.2,52.0,53.5,54.9,55.3,56.0,57.0,57.8,58.6,59.3,60.1,60.7,61.3,61.91 +Ukraine,62.2,62.94,63.63,64.42,66.26,67.15,67.19,68.88,69.26,70.88,71.15,70.56,71.18,71.97,71.4,71.66,71.25,71.33,70.7,70.59,70.81,70.57,70.75,70.63,69.96,70.01,69.68,69.63,69.36,69.33,69.36,69.51,69.48,69.17,69.47,70.82,70.61,70.49,70.43,70.0,69.4,68.8,68.3,67.5,66.5,66.7,67.3,68.1,67.7,67.3,67.5,67.5,67.7,67.5,67.1,67.9,67.6,67.8,69.6,70.5,71.1,71.2,71.3,71.3,71.5,71.7 +United Arab Emirates,41.83,43.04,44.22,45.37,46.5,47.62,48.7,49.77,50.82,51.85,52.89,53.91,54.91,55.9,56.87,57.81,58.7,59.54,60.33,61.08,61.78,62.45,63.09,63.7,64.3,64.87,65.41,65.93,66.43,66.91,67.36,67.79,68.2,68.6,68.98,69.34,69.68,70.0,70.31,70.6,70.8,71.1,71.3,71.6,71.9,72.1,72.4,72.8,73.0,73.3,73.6,73.8,74.1,74.4,75.2,75.7,75.6,75.6,75.6,75.6,75.5,75.5,75.4,75.4,75.4,75.4 +United Kingdom,68.26,69.55,69.82,70.19,70.15,70.42,70.54,70.71,70.81,71.02,70.77,70.84,70.74,71.53,71.52,71.43,72.06,71.68,71.64,71.89,72.2,71.98,72.18,72.38,72.65,72.62,73.11,73.04,73.14,73.57,73.9,74.03,74.28,74.66,74.51,74.78,75.12,75.23,75.36,75.7,76.0,76.2,76.3,76.6,76.7,76.9,77.1,77.3,77.5,77.8,78.0,78.2,78.4,78.7,79.0,79.2,79.5,79.7,80.0,80.2,80.5,80.7,80.8,80.9,81.0,81.1 +United States,68.22,68.44,68.79,69.58,69.63,69.71,69.49,69.76,69.98,69.91,70.32,70.21,70.04,70.33,70.41,70.43,70.76,70.42,70.66,70.92,71.24,71.34,71.54,72.08,72.68,72.99,73.38,73.58,74.03,73.93,74.36,74.65,74.71,74.81,74.79,74.87,75.01,75.02,75.1,75.4,75.5,75.8,75.7,75.8,75.9,76.3,76.6,76.8,76.9,76.9,76.9,77.1,77.3,77.6,77.6,77.8,78.1,78.3,78.5,78.8,78.9,79.0,79.1,79.1,79.1,79.1 +Uruguay,65.96,66.11,66.28,66.47,66.69,66.93,67.18,67.43,67.7,67.95,68.19,68.39,68.55,68.67,68.74,68.78,68.8,68.82,68.84,68.88,68.94,69.01,69.1,69.23,69.39,69.58,69.8,70.05,70.32,70.6,70.89,71.17,71.46,71.72,71.97,72.2,72.41,72.61,72.81,73.0,72.6,73.2,73.2,73.3,73.4,73.5,73.7,74.0,74.3,74.6,74.8,75.0,75.0,75.3,75.5,75.7,75.7,76.0,76.2,76.2,76.3,76.3,76.4,76.6,76.8,77.0 +Uzbekistan,55.32,55.78,56.23,56.68,57.13,57.58,58.02,58.46,58.91,59.35,59.8,60.25,60.7,61.15,61.59,62.02,62.43,62.83,63.2,63.54,63.86,64.14,64.4,64.64,64.87,65.11,65.37,65.64,65.94,66.25,66.59,66.93,67.27,67.59,67.85,68.02,68.09,68.06,67.96,67.8,67.6,67.3,67.0,66.7,66.6,66.7,66.9,67.1,67.4,67.6,67.8,67.9,68.1,68.3,68.5,68.8,69.2,69.6,69.9,70.2,70.6,70.9,71.2,71.5,71.8,72.1 +Vanuatu,40.79,41.36,41.94,42.51,43.09,43.67,44.24,44.82,45.4,45.97,46.55,47.14,47.71,48.29,48.87,49.44,50.01,50.56,51.12,51.67,52.21,52.77,53.33,53.89,54.46,55.05,55.64,56.24,56.83,57.41,57.97,58.5,58.98,59.44,59.87,60.27,60.67,61.07,61.48,61.9,62.0,62.1,62.2,62.2,62.3,62.4,61.2,62.5,62.0,62.5,62.5,62.5,62.5,62.6,62.7,62.9,63.2,63.4,63.6,63.9,64.1,64.4,64.6,64.7,64.9,65.1 +Venezuela,54.64,55.24,55.84,56.43,57.03,57.64,58.25,58.86,59.47,60.08,60.69,61.3,61.91,62.51,63.09,63.66,64.22,64.77,65.3,65.8,66.27,66.72,67.14,67.53,67.9,68.23,68.52,68.79,69.04,69.3,69.57,69.85,70.17,70.53,70.89,71.27,71.63,71.95,72.24,72.5,72.4,72.4,72.5,72.4,72.7,73.1,73.6,73.6,70.2,73.8,73.8,73.8,73.5,74.3,74.6,74.5,74.4,74.2,74.4,74.9,74.8,74.6,74.7,74.8,74.8,74.8 +Vietnam,51.98,52.81,53.6,54.36,55.11,55.83,56.52,57.19,57.86,58.52,59.17,59.82,60.42,60.95,61.32,61.36,61.06,60.45,59.63,58.78,58.17,58.0,58.35,59.23,60.54,62.07,63.58,64.86,65.84,66.49,66.86,67.1,67.3,67.51,67.77,68.07,68.38,68.68,69.0,69.3,69.6,69.8,70.1,70.3,70.6,70.9,71.1,71.5,71.7,72.0,72.2,72.5,72.8,73.0,73.3,73.5,73.8,74.1,74.3,74.5,74.7,74.9,75.0,75.2,75.4,75.6 +Virgin Islands (U.S.),57.9,58.87,59.74,60.54,61.25,61.88,62.44,62.93,63.36,63.75,64.11,64.46,64.82,65.2,65.6,66.02,66.44,66.87,67.29,67.71,68.12,68.53,68.94,69.34,69.73,70.11,70.46,70.8,71.12,71.43,71.74,72.05,72.38,72.71,73.06,73.41,73.75,74.09,74.42,74.73,75.04,75.34,75.64,75.94,76.23,76.52,76.8,77.07,77.33,77.57,77.8,78.0,78.19,78.36,78.52,78.69,78.86,79.05,79.25,79.46,79.69,79.92,80.15,80.38,80.6,80.82 +West Bank and Gaza,47.03,47.31,47.63,47.97,48.36,48.78,49.23,49.72,50.25,50.82,51.43,52.08,52.75,53.47,54.2,54.94,55.7,56.45,57.22,57.97,58.73,59.48,60.26,61.03,61.81,62.6,63.39,64.18,64.96,65.74,66.48,67.21,67.92,68.59,69.23,69.82,70.38,70.88,71.36,71.8,72.0,72.4,72.8,73.3,73.7,74.0,74.2,74.5,74.7,74.4,74.7,74.4,74.4,74.4,74.6,74.4,74.3,74.1,73.8,74.3,74.2,74.2,74.4,74.5,74.6,74.7 +Western Sahara,34.95,35.33,35.72,36.1,36.48,36.86,37.24,37.62,37.99,38.37,38.75,39.12,39.5,39.88,40.26,40.62,40.97,41.32,41.67,42.07,42.52,43.07,43.7,44.43,45.23,46.11,47.01,47.92,48.82,49.72,50.61,51.5,52.4,53.3,54.17,54.99,55.74,56.43,57.04,57.59,58.09,58.56,59.03,59.51,60.0,60.51,61.04,61.57,62.11,62.64,63.15,63.65,64.13,64.58,65.01,65.41,65.79,66.16,66.51,66.84,67.17,67.47,67.76,68.04,68.3,68.56 +Yemen,24.0,24.96,25.92,26.87,27.84,28.8,29.76,30.72,31.68,32.64,33.58,34.52,35.45,36.37,37.27,38.15,39.01,39.87,40.71,41.55,42.4,43.28,44.17,45.1,46.05,47.05,48.06,49.08,50.11,51.13,52.13,53.09,54.02,54.89,55.69,56.4,57.04,57.6,58.08,58.5,58.9,59.3,59.6,59.7,60.3,60.7,61.1,61.5,62.0,62.4,62.8,63.3,63.7,64.2,64.6,65.0,65.2,65.7,66.2,66.6,66.6,66.7,67.1,67.1,66.0,64.92 +Zambia,43.22,43.79,44.38,44.95,45.53,46.1,46.67,47.24,47.79,48.34,48.89,49.42,49.94,50.44,50.96,51.49,52.05,52.64,53.25,53.88,54.51,55.13,55.71,56.24,56.7,57.07,57.36,57.57,57.66,57.62,57.45,57.14,56.71,56.17,55.54,54.85,54.09,53.33,52.59,51.9,50.7,49.6,48.6,47.7,46.9,46.3,45.9,45.4,45.0,44.8,44.9,45.1,45.3,46.3,47.1,47.9,49.0,51.1,52.3,53.1,53.7,54.7,55.6,56.3,56.7,57.1 +Zimbabwe,48.75,49.25,49.75,50.25,50.73,51.22,51.71,52.17,52.64,53.11,53.55,53.99,54.42,54.83,55.25,55.65,56.04,56.43,56.83,57.22,57.63,58.05,58.47,58.92,59.41,59.94,60.53,61.17,61.82,62.48,63.13,63.73,64.23,64.63,64.86,64.9,64.74,64.39,63.81,63.0,62.7,61.4,59.8,58.2,56.0,54.4,52.8,50.9,49.3,47.9,47.0,45.9,45.3,44.7,45.1,45.5,46.4,47.3,48.0,49.1,51.6,54.2,55.7,57.0,59.3,61.69 diff --git a/docs/previous_versions/v0.4.0/data/offshore.csv b/docs/previous_versions/v0.4.0/data/offshore.csv new file mode 100755 index 000000000..5aa096441 --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/offshore.csv @@ -0,0 +1,828 @@ +college_grad,response +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion diff --git a/docs/previous_versions/v0.4.0/data/zinc_tidy.csv b/docs/previous_versions/v0.4.0/data/zinc_tidy.csv new file mode 100755 index 000000000..84856e658 --- /dev/null +++ b/docs/previous_versions/v0.4.0/data/zinc_tidy.csv @@ -0,0 +1,21 @@ +loc_id,location,concentration +1.0,bottom,0.43 +1.0,surface,0.415 +2.0,bottom,0.266 +2.0,surface,0.238 +3.0,bottom,0.567 +3.0,surface,0.39 +4.0,bottom,0.531 +4.0,surface,0.41 +5.0,bottom,0.707 +5.0,surface,0.605 +6.0,bottom,0.716 +6.0,surface,0.609 +7.0,bottom,0.651 +7.0,surface,0.632 +8.0,bottom,0.589 +8.0,surface,0.523 +9.0,bottom,0.469 +9.0,surface,0.411 +10.0,bottom,0.723 +10.0,surface,0.612 diff --git a/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg b/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg new file mode 100755 index 000000000..92464e41e Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/apps.jpg b/docs/previous_versions/v0.4.0/images/apps.jpg new file mode 100755 index 000000000..7ef7ea59a Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/apps.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/coggle.png b/docs/previous_versions/v0.4.0/images/coggle.png new file mode 100755 index 000000000..668944334 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/coggle.png differ diff --git a/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png b/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png new file mode 100755 index 000000000..054694d97 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png differ diff --git a/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png b/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png new file mode 100755 index 000000000..d7037938b Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png differ diff --git a/docs/previous_versions/v0.4.0/images/dashboard.jpg b/docs/previous_versions/v0.4.0/images/dashboard.jpg new file mode 100755 index 000000000..57996bf17 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/dashboard.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp.png b/docs/previous_versions/v0.4.0/images/datacamp.png new file mode 100755 index 000000000..2911de3c4 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png new file mode 100755 index 000000000..17fcfa240 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png new file mode 100755 index 000000000..811743c26 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png new file mode 100755 index 000000000..143c4cee8 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png b/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png new file mode 100755 index 000000000..81b3cf7fb Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intermediate_R.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png new file mode 100755 index 000000000..193664acd Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_R.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png new file mode 100755 index 000000000..8bd13337a Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png new file mode 100755 index 000000000..69ca9772a Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png differ diff --git a/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png b/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png new file mode 100755 index 000000000..eeb4ac861 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/datacamp_working_with_data.png differ diff --git a/docs/previous_versions/v0.4.0/images/engine.jpg b/docs/previous_versions/v0.4.0/images/engine.jpg new file mode 100755 index 000000000..597512b49 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/engine.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/errors.png b/docs/previous_versions/v0.4.0/images/errors.png new file mode 100755 index 000000000..43c19d9a3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/errors.png differ diff --git a/docs/previous_versions/v0.4.0/images/filter.png b/docs/previous_versions/v0.4.0/images/filter.png new file mode 100755 index 000000000..8cd96205d Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/filter.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png new file mode 100755 index 000000000..e14558e96 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png new file mode 100755 index 000000000..0ce574917 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png new file mode 100755 index 000000000..7c8b6c6a7 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png new file mode 100755 index 000000000..71139e1a1 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png new file mode 100755 index 000000000..e78715c4d Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png new file mode 100755 index 000000000..dce19ad70 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png new file mode 100755 index 000000000..964f0ae8f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png new file mode 100755 index 000000000..83b51e66e Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png new file mode 100755 index 000000000..d9baa59f1 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png new file mode 100755 index 000000000..d81baa6ff Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/generate.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png new file mode 100755 index 000000000..5effd3674 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png new file mode 100755 index 000000000..582bdad19 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png new file mode 100755 index 000000000..7f68e18b7 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/specify.png differ diff --git a/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png b/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png new file mode 100755 index 000000000..895426ff3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png differ diff --git a/docs/previous_versions/v0.4.0/images/group_summary.png b/docs/previous_versions/v0.4.0/images/group_summary.png new file mode 100755 index 000000000..2f09b0f0f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/group_summary.png differ diff --git a/docs/previous_versions/v0.4.0/images/guess_the_correlation.png b/docs/previous_versions/v0.4.0/images/guess_the_correlation.png new file mode 100755 index 000000000..fefdb23b1 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/guess_the_correlation.png differ diff --git a/docs/previous_versions/v0.4.0/images/ht.png b/docs/previous_versions/v0.4.0/images/ht.png new file mode 100755 index 000000000..204422828 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/ht.png differ diff --git a/docs/previous_versions/v0.4.0/images/iphone.jpg b/docs/previous_versions/v0.4.0/images/iphone.jpg new file mode 100755 index 000000000..cf3a222a0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/iphone.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/ismay.jpeg b/docs/previous_versions/v0.4.0/images/ismay.jpeg new file mode 100755 index 000000000..f68ead9ed Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/ismay.jpeg differ diff --git a/docs/previous_versions/v0.4.0/images/join-inner.png b/docs/previous_versions/v0.4.0/images/join-inner.png new file mode 100755 index 000000000..18e996daa Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/join-inner.png differ diff --git a/docs/previous_versions/v0.4.0/images/kim.jpeg b/docs/previous_versions/v0.4.0/images/kim.jpeg new file mode 100755 index 000000000..524aff3d5 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/kim.jpeg differ diff --git a/docs/previous_versions/v0.4.0/images/logos/book_cover.png b/docs/previous_versions/v0.4.0/images/logos/book_cover.png new file mode 100755 index 000000000..f20fd9ef6 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/book_cover.png differ diff --git a/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png b/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png new file mode 100755 index 000000000..d28831d0b Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png differ diff --git a/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico b/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico new file mode 100755 index 000000000..bddb10a6f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/logos/favicons/favicon.ico differ diff --git a/docs/previous_versions/v0.4.0/images/mutate.png b/docs/previous_versions/v0.4.0/images/mutate.png new file mode 100755 index 000000000..ab15762b8 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/mutate.png differ diff --git a/docs/previous_versions/v0.4.0/images/read_excel.png b/docs/previous_versions/v0.4.0/images/read_excel.png new file mode 100755 index 000000000..e9467bb82 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/read_excel.png differ diff --git a/docs/previous_versions/v0.4.0/images/relational-nycflights.png b/docs/previous_versions/v0.4.0/images/relational-nycflights.png new file mode 100755 index 000000000..10b04ce0f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/relational-nycflights.png differ diff --git a/docs/previous_versions/v0.4.0/images/rstudio.png b/docs/previous_versions/v0.4.0/images/rstudio.png new file mode 100755 index 000000000..e1d286545 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/rstudio.png differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg new file mode 100755 index 000000000..df2c5e1d2 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_025.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg new file mode 100755 index 000000000..68787cf3d Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_050.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg b/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg new file mode 100755 index 000000000..1cc70a70f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/shovel_100.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg new file mode 100755 index 000000000..9a045406f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg new file mode 100755 index 000000000..45b2791a9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg new file mode 100755 index 000000000..50ef8b56f Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg new file mode 100755 index 000000000..bd20120f3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg b/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg new file mode 100755 index 000000000..48412bcfd Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling_bowl_2.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg b/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg new file mode 100755 index 000000000..a38e5d063 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg differ diff --git a/docs/previous_versions/v0.4.0/images/select.png b/docs/previous_versions/v0.4.0/images/select.png new file mode 100755 index 000000000..a7329274a Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/select.png differ diff --git a/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png b/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png new file mode 100755 index 000000000..824dc86f0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/sign-2408065_1920.png differ diff --git a/docs/previous_versions/v0.4.0/images/summarize1.png b/docs/previous_versions/v0.4.0/images/summarize1.png new file mode 100755 index 000000000..e52e1d984 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/summarize1.png differ diff --git a/docs/previous_versions/v0.4.0/images/summary.png b/docs/previous_versions/v0.4.0/images/summary.png new file mode 100755 index 000000000..86415225e Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/summary.png differ diff --git a/docs/previous_versions/v0.4.0/images/tidy-1.png b/docs/previous_versions/v0.4.0/images/tidy-1.png new file mode 100755 index 000000000..4287d74c6 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/tidy-1.png differ diff --git a/docs/previous_versions/v0.4.0/images/tidy1.png b/docs/previous_versions/v0.4.0/images/tidy1.png new file mode 100755 index 000000000..88771ff58 Binary files /dev/null and b/docs/previous_versions/v0.4.0/images/tidy1.png differ diff --git a/docs/previous_versions/v0.4.0/index.html b/docs/previous_versions/v0.4.0/index.html new file mode 100644 index 000000000..3fdbeb151 --- /dev/null +++ b/docs/previous_versions/v0.4.0/index.html @@ -0,0 +1,941 @@ + + + + + + + + An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + + +
      +

      1 Introduction

      + +
      +
      +

      1.1 Important Note

      +

      This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to ModernDive.com.

      +
      +

      Drawing +      +Drawing

      +

      Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students.

      +
        +
      • Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book.
      • +
      • Are you looking to connect with and contribute to ModernDive? Then click here for information on how.
      • +
      • Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package.
      • +
      +

      This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section 1.6.

      +
      +
      +
      +

      1.2 Introduction for students

      +

      This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.

      +

      In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then

      +
        +
      1. Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: +
          +
        • Ch.3: Visualizing data via the ggplot2 package.
        • +
        • Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse
        • +
        • Ch.5: Wrangling data via the dplyr package.
        • +
      2. +
      3. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: +
          +
        • Ch.6: Constructing basic regression models.
        • +
        • Ch.7: Constructing multiple regression models.
        • +
      4. +
      5. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: +
          +
        • Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.
        • +
        • Ch.9: Building confidence intervals.
        • +
        • Ch.10: Conducting hypothesis tests.
        • +
      6. +
      7. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 & Ch.6. In particular: +
          +
        • Ch.11: Interpreting both the statistical and practice significance of the results of the models.
        • +
      8. +
      +

      We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle.

      +
      +ModernDive Flowchart +

      +Figure 1.1: ModernDive Flowchart +

      +
      +
      +

      1.2.1 What you will learn from this book

      +

      We hope that by the end of this book, you’ll have learned

      +
        +
      1. How to use R to explore data.
        +
      2. +
      3. How to answer statistical questions using tools like confidence intervals and hypothesis tests.
      4. +
      5. How to effectively create “data stories” using these tools.
      6. +
      +

      What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article.

      +

      For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive:

      + +

      This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.2.2 for more details).

      +

      In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book.

      +

      To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived.

      +

      Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.2.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book:

      +
      +

      “Programs must be written for people to read, and only incidentally for machines to execute.”

      +
      +

      We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com.

      +
      +
      +

      1.2.2 Data/science pipeline

      +

      You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order):

      +
        +
      • data collection
      • +
      • data wrangling
      • +
      • data visualization
      • +
      • data modeling
      • +
      • inference
      • +
      • correlation and regression
      • +
      • interpretation of results
      • +
      • data communication/storytelling
      • +
      +

      These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2.

      +
      +Data/Science Pipeline +

      +Figure 1.2: Data/Science Pipeline +

      +
      +

      We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?”

      +

      There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge.

      +
      +
      +

      1.2.3 Reproducible research

      +
      +

      “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly

      +
      +

      Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well.

      +

      Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs.

      +

      In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science.

      +
      +

      “We are talking about computational reproducibility.” - Yihui Xie

      +
      +

      Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine.

      + +
      +
      +

      1.2.4 Final note for students

      +

      At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2!

      +
      +
      +
      +
      +

      1.3 Introduction for instructors

      +

      This book is inspired by the following books:

      +
        +
      • “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011),
      • +
      • “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and
      • +
      • “R for Data Science” (Grolemund and Wickham 2016).
      • +
      +

      The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.

      +

      When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.2.1.

      +
      +

      1.3.1 Who is this book for?

      +

      This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.

      +

      Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.

      +
        +
      1. Blur the lines between lecture and lab +
          +
        • With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
        • +
        • It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
        • +
      2. +
      3. Focus on the entire data/science research pipeline +
      4. +
      5. It’s all about the data +
          +
        • We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages.
        • +
        • We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic.
        • +
        • dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored.
        • +
      6. +
      7. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas +
          +
        • Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
        • +
        • This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
        • +
      8. +
      9. Don’t fence off students from the computation pool, throw them in! +
          +
        • Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
        • +
        • We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
        • +
      10. +
      11. Complete reproducibility and customizability +
          +
        • We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
        • +
        • Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book.
        • +
      12. +
      +
      +
      +
      +
      +

      1.4 DataCamp

      +

      +

      DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback.

      +

      The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience.

      +

      Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free.

      + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      ChapterTopicDataCamp Courses
      2Basic R programming conceptsDrawing Drawing
      3 & 5Introductory data visualization and wranglingDrawing
      4 & 5Data “tidying” and intermediate data wranglingDrawing
      6 & 7Data modelling, basic regression, and multiple regressionDrawing
      9 & 10Statistical inference: confidence intervals and hypothesis testingDrawing Drawing
      11Inference for regressionDrawing
      +
      +
      +
      +

      1.5 Connect and contribute

      +

      If you would like to connect with ModernDive, check out the following links:

      + +

      If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible!

      +
        +
      • Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page.
      • +
      • If you are familiar with GitHub and would like to contribute more, please see Section 1.6 below.
      • +
      +

      The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions.

      +
      +
      +
      +

      1.6 About this book

      +

      This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:

      + +

      Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.

      +

      Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!”

      +
      +
      +
      +

      1.7 About the authors

      +

      Who we are!

      + + + + + + + + + + + + + +
      Chester IsmayAlbert Y. Kim
      DrawingDrawing
      + + + +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-257-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-257-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png new file mode 100644 index 000000000..1c40650d5 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png new file mode 100644 index 000000000..75f1b3198 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png new file mode 100644 index 000000000..e64a9d291 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png new file mode 100644 index 000000000..b0424a705 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png new file mode 100644 index 000000000..eac2ae532 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png new file mode 100644 index 000000000..b461e4d8d Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png new file mode 100644 index 000000000..c1b23c5d0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png new file mode 100644 index 000000000..7533ef4c0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png new file mode 100644 index 000000000..59079169e Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png new file mode 100644 index 000000000..c373f0614 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png new file mode 100644 index 000000000..52ec8d8c9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png new file mode 100644 index 000000000..8e38890bd Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png new file mode 100644 index 000000000..bc07984da Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png new file mode 100644 index 000000000..19c3d3ce4 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png new file mode 100644 index 000000000..05fd2a2c6 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png new file mode 100644 index 000000000..ae1e27ce4 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png new file mode 100644 index 000000000..b15743ff0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png new file mode 100644 index 000000000..5c819748e Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png new file mode 100644 index 000000000..6c974ac5c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png new file mode 100644 index 000000000..945c2a767 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png new file mode 100644 index 000000000..ac51b9a4c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png new file mode 100644 index 000000000..0e7da0c56 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png differ diff --git a/docs/ismaykim_files/figure-html/guatline-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/guatline-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png new file mode 100644 index 000000000..21cc533e4 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png new file mode 100644 index 000000000..9aabd2385 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png new file mode 100644 index 000000000..c22482a88 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png new file mode 100644 index 000000000..ed39c46a3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png new file mode 100644 index 000000000..5289175ca Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png new file mode 100644 index 000000000..1aff0c4d4 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png new file mode 100644 index 000000000..87b9e1d29 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png new file mode 100644 index 000000000..f7be7b354 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png new file mode 100644 index 000000000..ae2dc50a1 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png new file mode 100644 index 000000000..e24767ad9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png new file mode 100644 index 000000000..89a84cdec Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png new file mode 100644 index 000000000..35d0b1184 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png new file mode 100644 index 000000000..1b56768ff Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png new file mode 100644 index 000000000..9d795fdd9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png new file mode 100644 index 000000000..fa5dcdc0c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png new file mode 100644 index 000000000..b57a36c43 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png new file mode 100644 index 000000000..a9ba01c3c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png new file mode 100644 index 000000000..99c74f9f3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png new file mode 100644 index 000000000..e4bf82faa Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png new file mode 100644 index 000000000..1dba66321 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png new file mode 100644 index 000000000..b1494c193 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png new file mode 100644 index 000000000..677284899 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png new file mode 100644 index 000000000..2b7c97f70 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png new file mode 100644 index 000000000..2f468ce06 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png new file mode 100644 index 000000000..9cbd57423 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png new file mode 100644 index 000000000..07492d6b9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png new file mode 100644 index 000000000..80f396a81 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png new file mode 100644 index 000000000..22004abf2 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png new file mode 100644 index 000000000..0c3d6fc59 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png new file mode 100644 index 000000000..257b8868f Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png new file mode 100644 index 000000000..f51ed808e Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png new file mode 100644 index 000000000..b16443c75 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png new file mode 100644 index 000000000..a02e9ea6d Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png new file mode 100644 index 000000000..cbd0db485 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png new file mode 100644 index 000000000..250e5aba0 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png new file mode 100644 index 000000000..9fe9d9ddb Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png new file mode 100644 index 000000000..b1bbe1929 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png new file mode 100644 index 000000000..354794075 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png new file mode 100644 index 000000000..6497fc940 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png new file mode 100644 index 000000000..894639541 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png new file mode 100644 index 000000000..e4d3802a7 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png new file mode 100644 index 000000000..f223722fe Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png new file mode 100644 index 000000000..2893c6c1b Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png new file mode 100644 index 000000000..68d82d93f Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png new file mode 100644 index 000000000..07c033577 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png new file mode 100644 index 000000000..44a656cb9 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png new file mode 100644 index 000000000..45267ae95 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-76-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-76-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-209-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-209-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-210-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-210-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-242-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-242-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png new file mode 100644 index 000000000..2ec3bb210 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png new file mode 100644 index 000000000..2eff27fd8 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-30-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-30-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-251-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-251-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-300-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-300-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-31-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-31-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-303-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-303-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-32-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-32-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-314-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-314-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png new file mode 100644 index 000000000..ecc2e03aa Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-33-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-33-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png new file mode 100644 index 000000000..67f3cebba Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png new file mode 100644 index 000000000..59b11718b Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png new file mode 100644 index 000000000..b8923fa33 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-283-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-283-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png new file mode 100644 index 000000000..887295abf Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png new file mode 100644 index 000000000..361b4cf56 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-297-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png new file mode 100644 index 000000000..3f91edc4c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png new file mode 100644 index 000000000..f5ebba1b3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png new file mode 100644 index 000000000..b66803a57 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png new file mode 100644 index 000000000..51c650433 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-378-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-378-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-344-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-344-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-390-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-390-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-348-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-348-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-405-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-405-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png new file mode 100644 index 000000000..a7c90d982 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png new file mode 100644 index 000000000..38e224167 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-413-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-413-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-369-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-369-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png new file mode 100644 index 000000000..61c1fc57b Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png new file mode 100644 index 000000000..81bb3ed7e Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-418-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-418-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-428-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-428-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-432-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-432-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-461-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-461-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-470-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png new file mode 100644 index 000000000..987c77048 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png new file mode 100644 index 000000000..23529eddf Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png new file mode 100644 index 000000000..e5fc89b14 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png new file mode 100644 index 000000000..9fba85a96 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png new file mode 100644 index 000000000..d8fdb2a55 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-488-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-488-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png new file mode 100644 index 000000000..d58a15b80 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png new file mode 100644 index 000000000..7330b399c Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-497-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-497-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png new file mode 100644 index 000000000..08e92540f Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png differ diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png new file mode 100644 index 000000000..ca3cdd4bf Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-506-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-506-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png new file mode 100644 index 000000000..c561dced3 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-513-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-513-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-491-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-491-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-52-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-52-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-55-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-55-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-122-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-122-1.png rename to docs/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png diff --git a/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png new file mode 100644 index 000000000..a3dff48e8 Binary files /dev/null and b/docs/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png differ diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js new file mode 100644 index 000000000..7d6121e1d --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js @@ -0,0 +1,6 @@ +/*! @license Copyright 2014 Dan Vanderkam (danvdk@gmail.com) MIT-licensed (http://opensource.org/licenses/MIT) */ +!function(t){"use strict";for(var e,a,i={},r=function(){},n="memory".split(","),o="assert,clear,count,debug,dir,dirxml,error,exception,group,groupCollapsed,groupEnd,info,log,markTimeline,profile,profiles,profileEnd,show,table,time,timeEnd,timeline,timelineEnd,timeStamp,trace,warn".split(",");e=n.pop();)t[e]=t[e]||i;for(;a=o.pop();)t[a]=t[a]||r}(this.console=this.console||{}),function(){"use strict";CanvasRenderingContext2D.prototype.installPattern=function(t){if("undefined"!=typeof this.isPatternInstalled)throw"Must un-install old line pattern before installing a new one.";this.isPatternInstalled=!0;var e=[0,0],a=[],i=this.beginPath,r=this.lineTo,n=this.moveTo,o=this.stroke;this.uninstallPattern=function(){this.beginPath=i,this.lineTo=r,this.moveTo=n,this.stroke=o,this.uninstallPattern=void 0,this.isPatternInstalled=void 0},this.beginPath=function(){a=[],i.call(this)},this.moveTo=function(t,e){a.push([[t,e]]),n.call(this,t,e)},this.lineTo=function(t,e){var i=a[a.length-1];i.push([t,e])},this.stroke=function(){if(0===a.length)return void o.call(this);for(var i=0;if;){var x=t[v];f+=e[1]?e[1]:x,f>y?(e=[v,f-y],f=y):e=[(v+1)%t.length,0],v%2===0?r.call(this,f,0):n.call(this,f,0),v=(v+1)%t.length}this.restore(),l=g,h=d}o.call(this),a=[]}},CanvasRenderingContext2D.prototype.uninstallPattern=function(){throw"Must install a line pattern before uninstalling it."}}();var DygraphOptions=function(){return function(){"use strict";var t=function(t){this.dygraph_=t,this.yAxes_=[],this.xAxis_={},this.series_={},this.global_=this.dygraph_.attrs_,this.user_=this.dygraph_.user_attrs_||{},this.labels_=[],this.highlightSeries_=this.get("highlightSeriesOpts")||{},this.reparseSeries()};t.AXIS_STRING_MAPPINGS_={y:0,Y:0,y1:0,Y1:0,y2:1,Y2:1},t.axisToIndex_=function(e){if("string"==typeof e){if(t.AXIS_STRING_MAPPINGS_.hasOwnProperty(e))return t.AXIS_STRING_MAPPINGS_[e];throw"Unknown axis : "+e}if("number"==typeof e){if(0===e||1===e)return e;throw"Dygraphs only supports two y-axes, indexed from 0-1."}if(e)throw"Unknown axis : "+e;return 0},t.prototype.reparseSeries=function(){var e=this.get("labels");if(e){this.labels_=e.slice(1),this.yAxes_=[{series:[],options:{}}],this.xAxis_={options:{}},this.series_={};var a=!this.user_.series;if(a){for(var i=0,r=0;r1&&Dygraph.update(this.yAxes_[1].options,h.y2||{}),Dygraph.update(this.xAxis_.options,h.x||{})}},t.prototype.get=function(t){var e=this.getGlobalUser_(t);return null!==e?e:this.getGlobalDefault_(t)},t.prototype.getGlobalUser_=function(t){return this.user_.hasOwnProperty(t)?this.user_[t]:null},t.prototype.getGlobalDefault_=function(t){return this.global_.hasOwnProperty(t)?this.global_[t]:Dygraph.DEFAULT_ATTRS.hasOwnProperty(t)?Dygraph.DEFAULT_ATTRS[t]:null},t.prototype.getForAxis=function(t,e){var a,i;if("number"==typeof e)a=e,i=0===a?"y":"y2";else{if("y1"==e&&(e="y"),"y"==e)a=0;else if("y2"==e)a=1;else{if("x"!=e)throw"Unknown axis "+e;a=-1}i=e}var r=-1==a?this.xAxis_:this.yAxes_[a];if(r){var n=r.options;if(n.hasOwnProperty(t))return n[t]}if("x"!==e||"logscale"!==t){var o=this.getGlobalUser_(t);if(null!==o)return o}var s=Dygraph.DEFAULT_ATTRS.axes[i];return s.hasOwnProperty(t)?s[t]:this.getGlobalDefault_(t)},t.prototype.getForSeries=function(t,e){if(e===this.dygraph_.getHighlightSeries()&&this.highlightSeries_.hasOwnProperty(t))return this.highlightSeries_[t];if(!this.series_.hasOwnProperty(e))throw"Unknown series: "+e;var a=this.series_[e],i=a.options;return i.hasOwnProperty(t)?i[t]:this.getForAxis(t,a.yAxis)},t.prototype.numAxes=function(){return this.yAxes_.length},t.prototype.axisForSeries=function(t){return this.series_[t].yAxis},t.prototype.axisOptions=function(t){return this.yAxes_[t].options},t.prototype.seriesForAxis=function(t){return this.yAxes_[t].series},t.prototype.seriesNames=function(){return this.labels_};return t}()}(),DygraphLayout=function(){"use strict";var t=function(t){this.dygraph_=t,this.points=[],this.setNames=[],this.annotations=[],this.yAxes_=null,this.xTicks_=null,this.yTicks_=null};return t.prototype.addDataset=function(t,e){this.points.push(e),this.setNames.push(t)},t.prototype.getPlotArea=function(){return this.area_},t.prototype.computePlotArea=function(){var t={x:0,y:0};t.w=this.dygraph_.width_-t.x-this.dygraph_.getOption("rightGap"),t.h=this.dygraph_.height_;var e={chart_div:this.dygraph_.graphDiv,reserveSpaceLeft:function(e){var a={x:t.x,y:t.y,w:e,h:t.h};return t.x+=e,t.w-=e,a},reserveSpaceRight:function(e){var a={x:t.x+t.w-e,y:t.y,w:e,h:t.h};return t.w-=e,a},reserveSpaceTop:function(e){var a={x:t.x,y:t.y,w:t.w,h:e};return t.y+=e,t.h-=e,a},reserveSpaceBottom:function(e){var a={x:t.x,y:t.y+t.h-e,w:t.w,h:e};return t.h-=e,a},chartRect:function(){return{x:t.x,y:t.y,w:t.w,h:t.h}}};this.dygraph_.cascadeEvents_("layout",e),this.area_=t},t.prototype.setAnnotations=function(t){this.annotations=[];for(var e=this.dygraph_.getOption("xValueParser")||function(t){return t},a=0;a=0&&1>i&&this.xticks.push([i,a]);for(this.yticks=[],t=0;t0&&1>=i&&this.yticks.push([t,i,a])},t.prototype._evaluateAnnotations=function(){var t,e={};for(t=0;t=0;i--)a.childNodes[i].className==e&&a.removeChild(a.childNodes[i]);for(var r=document.bgColor,n=this.dygraph_.graphDiv;n!=document;){var o=n.currentStyle.backgroundColor;if(o&&"transparent"!=o){r=o;break}n=n.parentNode}var s=this.area;t({x:0,y:0,w:s.x,h:this.height}),t({x:s.x,y:0,w:this.width-s.x,h:s.y}),t({x:s.x+s.w,y:0,w:this.width-s.x-s.w,h:this.height}),t({x:s.x,y:s.y+s.h,w:this.width-s.x,h:this.height-s.h-s.y})},t._getIteratorPredicate=function(e){return e?t._predicateThatSkipsEmptyPoints:null},t._predicateThatSkipsEmptyPoints=function(t,e){return null!==t[e].yval},t._drawStyledLine=function(e,a,i,r,n,o,s){var l=e.dygraph,h=l.getBooleanOption("stepPlot",e.setName);Dygraph.isArrayLike(r)||(r=null);var p=l.getBooleanOption("drawGapEdgePoints",e.setName),g=e.points,d=e.setName,u=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(l.getBooleanOption("connectSeparatedPoints",d))),c=r&&r.length>=2,y=e.drawingContext;y.save(),c&&y.installPattern(r);var _=t._drawSeries(e,u,i,s,n,p,h,a);t._drawPointsOnLine(e,_,o,a,s),c&&y.uninstallPattern(),y.restore()},t._drawSeries=function(t,e,a,i,r,n,o,s){var l,h,p=null,g=null,d=null,u=[],c=!0,y=t.drawingContext;y.beginPath(),y.strokeStyle=s,y.lineWidth=a;for(var _=e.array_,v=e.end_,f=e.predicate_,x=e.start_;v>x;x++){if(h=_[x],f){for(;v>x&&!f(_,x);)x++;if(x==v)break;h=_[x]}if(null===h.canvasy||h.canvasy!=h.canvasy)o&&null!==p&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),p=g=null;else{if(l=!1,n||!p){e.nextIdx_=x,e.next(),d=e.hasNext?e.peek.canvasy:null;var m=null===d||d!=d;l=!p&&m,n&&(!c&&!p||e.hasNext&&m)&&(l=!0)}null!==p?a&&(o&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),y.lineTo(h.canvasx,h.canvasy)):y.moveTo(h.canvasx,h.canvasy),(r||l)&&u.push([h.canvasx,h.canvasy,h.idx]),p=h.canvasx,g=h.canvasy}c=!1}return y.stroke(),u},t._drawPointsOnLine=function(t,e,a,i,r){for(var n=t.drawingContext,o=0;o0;a--){var i=e[a];if(i[0]==n){var o=e[a-1];o[1]==i[1]&&o[2]==i[2]&&e.splice(a,1)}}for(var a=0;a2&&!t){var s=0;e[0][0]==n&&s++;for(var l=null,h=null,a=s;ae[h][2]&&(h=a)}}var g=e[l],d=e[h];e.splice(s,e.length-s),h>l?(e.push(g),e.push(d)):l>h?(e.push(d),e.push(g)):e.push(g)}}},l=function(a){s(a);for(var l=0,h=e.length;h>l;l++){var p=e[l];p[0]==r?t.lineTo(p[1],p[2]):p[0]==n&&t.moveTo(p[1],p[2])}e.length&&(i=e[e.length-1][1]),o+=e.length,e=[]},h=function(t,r,n){var o=Math.round(r);if(null===a||o!=a){var s=a-i>1,h=o-a>1,p=s||h;l(p),a=o}e.push([t,r,n])};return{moveTo:function(t,e){h(n,t,e)},lineTo:function(t,e){h(r,t,e)},stroke:function(){l(!0),t.stroke()},fill:function(){l(!0),t.fill()},beginPath:function(){l(!0),t.beginPath()},closePath:function(){l(!0),t.closePath()},_count:function(){return o}}},t._fillPlotter=function(e){if(!e.singleSeriesName&&0===e.seriesIndex){for(var a=e.dygraph,i=a.getLabels().slice(1),r=i.length;r>=0;r--)a.visibility()[r]||i.splice(r,1);var n=function(){for(var t=0;t=0;r--){var n=i[r];t.lineTo(n[0],n[1])}},_=p-1;_>=0;_--){var v=e.drawingContext,f=i[_];if(a.getBooleanOption("fillGraph",f)){var x=a.getBooleanOption("stepPlot",f),m=u[_],D=a.axisPropertiesForSeries(f),w=1+D.minyval*D.yscale;0>w?w=0:w>1&&(w=1),w=l.h*w+l.y;var A,b=h[_],T=Dygraph.createIterator(b,0,b.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",f))),E=0/0,C=[-1,-1],L=Dygraph.toRGB_(m),P="rgba("+L.r+","+L.g+","+L.b+","+g+")";v.fillStyle=P,v.beginPath();var S,O=!0;(b.length>2*a.width_||Dygraph.FORCE_FAST_PROXY)&&(v=t._fastCanvasProxy(v));for(var M,R=[];T.hasNext;)if(M=T.next(),Dygraph.isOK(M.y)||x){if(d){if(!O&&S==M.xval)continue;O=!1,S=M.xval,o=c[M.canvasx];var F;F=void 0===o?w:s?o[0]:o,A=[M.canvasy,F],x?-1===C[0]?c[M.canvasx]=[M.canvasy,w]:c[M.canvasx]=[M.canvasy,C[0]]:c[M.canvasx]=M.canvasy}else A=isNaN(M.canvasy)&&x?[l.y+l.h,w]:[M.canvasy,w];isNaN(E)?(v.moveTo(M.canvasx,A[1]),v.lineTo(M.canvasx,A[0])):(x?(v.lineTo(M.canvasx,C[0]),v.lineTo(M.canvasx,A[0])):v.lineTo(M.canvasx,A[0]),d&&(R.push([E,C[1]]),R.push(s&&o?[M.canvasx,o[1]]:[M.canvasx,A[1]]))),C=A,E=M.canvasx}else y(v,E,C[1],R),R=[],E=0/0,null===M.y_stacked||isNaN(M.y_stacked)||(c[M.canvasx]=l.h*M.y_stacked+l.y);s=x,A&&M&&(y(v,M.canvasx,A[1],R),R=[]),v.fill()}}}},t}(),Dygraph=function(){"use strict";var t=function(t,e,a,i){this.is_initial_draw_=!0,this.readyFns_=[],void 0!==i?(console.warn("Using deprecated four-argument dygraph constructor"),this.__old_init__(t,e,a,i)):this.__init__(t,e,a)};return t.NAME="Dygraph",t.VERSION="1.1.1",t.__repr__=function(){return"["+t.NAME+" "+t.VERSION+"]"},t.toString=function(){return t.__repr__()},t.DEFAULT_ROLL_PERIOD=1,t.DEFAULT_WIDTH=480,t.DEFAULT_HEIGHT=320,t.ANIMATION_STEPS=12,t.ANIMATION_DURATION=200,t.KMB_LABELS=["K","M","B","T","Q"],t.KMG2_BIG_LABELS=["k","M","G","T","P","E","Z","Y"],t.KMG2_SMALL_LABELS=["m","u","n","p","f","a","z","y"],t.numberValueFormatter=function(e,a){var i=a("sigFigs");if(null!==i)return t.floatFormat(e,i);var r,n=a("digitsAfterDecimal"),o=a("maxNumberWidth"),s=a("labelsKMB"),l=a("labelsKMG2");if(r=0!==e&&(Math.abs(e)>=Math.pow(10,o)||Math.abs(e)=0;c--,u/=h)if(d>=u){r=t.round_(e/u,n)+p[c];break}if(l){var y=String(e.toExponential()).split("e-");2===y.length&&y[1]>=3&&y[1]<=24&&(r=y[1]%3>0?t.round_(y[0]/t.pow(10,y[1]%3),n):Number(y[0]).toFixed(2),r+=g[Math.floor(y[1]/3)-1])}}return r},t.numberAxisLabelFormatter=function(e,a,i){return t.numberValueFormatter.call(this,e,i)},t.SHORT_MONTH_NAMES_=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],t.dateAxisLabelFormatter=function(e,a,i){var r=i("labelsUTC"),n=r?t.DateAccessorsUTC:t.DateAccessorsLocal,o=n.getFullYear(e),s=n.getMonth(e),l=n.getDate(e),h=n.getHours(e),p=n.getMinutes(e),g=n.getSeconds(e),d=n.getSeconds(e);if(a>=t.DECADAL)return""+o;if(a>=t.MONTHLY)return t.SHORT_MONTH_NAMES_[s]+" "+o;var u=3600*h+60*p+g+.001*d;return 0===u||a>=t.DAILY?t.zeropad(l)+" "+t.SHORT_MONTH_NAMES_[s]:t.hmsString_(h,p,g)},t.dateAxisFormatter=t.dateAxisLabelFormatter,t.dateValueFormatter=function(e,a){return t.dateString_(e,a("labelsUTC"))},t.Plotters=DygraphCanvasRenderer._Plotters,t.DEFAULT_ATTRS={highlightCircleSize:3,highlightSeriesOpts:null,highlightSeriesBackgroundAlpha:.5,labelsDivWidth:250,labelsDivStyles:{},labelsSeparateLines:!1,labelsShowZeroValues:!0,labelsKMB:!1,labelsKMG2:!1,showLabelsOnHighlight:!0,digitsAfterDecimal:2,maxNumberWidth:6,sigFigs:null,strokeWidth:1,strokeBorderWidth:0,strokeBorderColor:"white",axisTickSize:3,axisLabelFontSize:14,rightGap:5,showRoller:!1,xValueParser:t.dateParser,delimiter:",",sigma:2,errorBars:!1,fractions:!1,wilsonInterval:!0,customBars:!1,fillGraph:!1,fillAlpha:.15,connectSeparatedPoints:!1,stackedGraph:!1,stackedGraphNaNFill:"all",hideOverlayOnMouseOut:!0,legend:"onmouseover",stepPlot:!1,avoidMinZero:!1,xRangePad:0,yRangePad:null,drawAxesAtZero:!1,titleHeight:28,xLabelHeight:18,yLabelWidth:18,drawXAxis:!0,drawYAxis:!0,axisLineColor:"black",axisLineWidth:.3,gridLineWidth:.3,axisLabelColor:"black",axisLabelWidth:50,drawYGrid:!0,drawXGrid:!0,gridLineColor:"rgb(128,128,128)",interactionModel:null,animatedZooms:!1,showRangeSelector:!1,rangeSelectorHeight:40,rangeSelectorPlotStrokeColor:"#808FAB",rangeSelectorPlotFillColor:"#A7B1C4",showInRangeSelector:null,plotter:[t.Plotters.fillPlotter,t.Plotters.errorPlotter,t.Plotters.linePlotter],plugins:[],axes:{x:{pixelsPerLabel:70,axisLabelWidth:60,axisLabelFormatter:t.dateAxisLabelFormatter,valueFormatter:t.dateValueFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y2:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawAxis:!0,drawGrid:!1,independentTicks:!1,ticker:null}}},t.HORIZONTAL=1,t.VERTICAL=2,t.PLUGINS=[],t.addedAnnotationCSS=!1,t.prototype.__old_init__=function(e,a,i,r){if(null!==i){for(var n=["Date"],o=0;o=0;n--){var o=r[n][0],s=r[n][1];if(s.call(o,i),i.propagationStopped)break}return i.defaultPrevented},t.prototype.getPluginInstance_=function(t){for(var e=0;et||t>=this.axes_.length)return null;var e=this.axes_[t];return[e.computedValueRange[0],e.computedValueRange[1]]},t.prototype.yAxisRanges=function(){for(var t=[],e=0;et||t>this.rawData_.length?null:0>e||e>this.rawData_[t].length?null:this.rawData_[t][e]},t.prototype.createInterface_=function(){var e=this.maindiv_;this.graphDiv=document.createElement("div"),this.graphDiv.style.textAlign="left",this.graphDiv.style.position="relative",e.appendChild(this.graphDiv),this.canvas_=t.createCanvas(),this.canvas_.style.position="absolute",this.hidden_=this.createPlotKitCanvas_(this.canvas_),this.canvas_ctx_=t.getContext(this.canvas_),this.hidden_ctx_=t.getContext(this.hidden_),this.resizeElements_(),this.graphDiv.appendChild(this.hidden_),this.graphDiv.appendChild(this.canvas_),this.mouseEventElement_=this.createMouseEventElement_(),this.layout_=new DygraphLayout(this);var a=this;this.mouseMoveHandler_=function(t){a.mouseMove_(t)},this.mouseOutHandler_=function(e){var i=e.target||e.fromElement,r=e.relatedTarget||e.toElement;t.isNodeContainedBy(i,a.graphDiv)&&!t.isNodeContainedBy(r,a.graphDiv)&&a.mouseOut_(e)},this.addAndTrackEvent(window,"mouseout",this.mouseOutHandler_),this.addAndTrackEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),this.resizeHandler_||(this.resizeHandler_=function(t){a.resize()},this.addAndTrackEvent(window,"resize",this.resizeHandler_))},t.prototype.resizeElements_=function(){this.graphDiv.style.width=this.width_+"px",this.graphDiv.style.height=this.height_+"px";var e=t.getContextPixelRatio(this.canvas_ctx_);this.canvas_.width=this.width_*e,this.canvas_.height=this.height_*e,this.canvas_.style.width=this.width_+"px",this.canvas_.style.height=this.height_+"px",1!==e&&this.canvas_ctx_.scale(e,e);var a=t.getContextPixelRatio(this.hidden_ctx_);this.hidden_.width=this.width_*a,this.hidden_.height=this.height_*a,this.hidden_.style.width=this.width_+"px",this.hidden_.style.height=this.height_+"px",1!==a&&this.hidden_ctx_.scale(a,a)},t.prototype.destroy=function(){this.canvas_ctx_.restore(),this.hidden_ctx_.restore();for(var e=this.plugins_.length-1;e>=0;e--){var a=this.plugins_.pop();a.plugin.destroy&&a.plugin.destroy()}var i=function(t){for(;t.hasChildNodes();)i(t.firstChild),t.removeChild(t.firstChild)};this.removeTrackedEvents_(),t.removeEvent(window,"mouseout",this.mouseOutHandler_),t.removeEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),t.removeEvent(window,"resize",this.resizeHandler_),this.resizeHandler_=null,i(this.maindiv_);var r=function(t){for(var e in t)"object"==typeof t[e]&&(t[e]=null)};r(this.layout_),r(this.plotter_),r(this)},t.prototype.createPlotKitCanvas_=function(e){var a=t.createCanvas();return a.style.position="absolute",a.style.top=e.style.top,a.style.left=e.style.left,a.width=this.width_,a.height=this.height_,a.style.width=this.width_+"px",a.style.height=this.height_+"px",a},t.prototype.createMouseEventElement_=function(){if(this.isUsingExcanvas_){var t=document.createElement("div");return t.style.position="absolute",t.style.backgroundColor="white",t.style.filter="alpha(opacity=0)",t.style.width=this.width_+"px",t.style.height=this.height_+"px",this.graphDiv.appendChild(t),t}return this.canvas_},t.prototype.setColors_=function(){var e=this.getLabels(),a=e.length-1;this.colors_=[],this.colorsMap_={};for(var i=this.getNumericOption("colorSaturation")||1,r=this.getNumericOption("colorValue")||.5,n=Math.ceil(a/2),o=this.getOption("colors"),s=this.visibility(),l=0;a>l;l++)if(s[l]){ +var h=e[l+1],p=this.attributes_.getForSeries("color",h);if(!p)if(o)p=o[l%o.length];else{var g=l%2?n+(l+1)/2:Math.ceil((l+1)/2),d=1*g/(1+a);p=t.hsvToRGB(d,i,r)}this.colors_.push(p),this.colorsMap_[h]=p}},t.prototype.getColors=function(){return this.colors_},t.prototype.getPropertiesForSeries=function(t){for(var e=-1,a=this.getLabels(),i=1;i=o;o++)s=t.zoomAnimationFunction(o,l),h[o-1]=[e[0]*(1-s)+s*a[0],e[1]*(1-s)+s*a[1]];if(null!==i&&null!==r)for(o=1;l>=o;o++){s=t.zoomAnimationFunction(o,l);for(var g=[],d=0;dl;l++){var h=o[l];if(t.isValidPoint(h,!0)){var p=Math.abs(h.canvasx-e);a>p&&(a=p,i=h.idx)}}return i},t.prototype.findClosestPoint=function(e,a){for(var i,r,n,o,s,l,h,p=1/0,g=this.layout_.points.length-1;g>=0;--g)for(var d=this.layout_.points[g],u=0;ui&&(p=i,s=o,l=g,h=o.idx));var c=this.layout_.setNames[l];return{row:h,seriesName:c,point:s}},t.prototype.findStackedPoint=function(e,a){for(var i,r,n=this.findClosestRow(e),o=0;o=h.length)){var p=h[l];if(t.isValidPoint(p)){var g=p.canvasy;if(e>p.canvasx&&l+10){var c=(e-p.canvasx)/u;g+=c*(d.canvasy-p.canvasy)}}}else if(e0){var y=h[l-1];if(t.isValidPoint(y)){var u=p.canvasx-y.canvasx;if(u>0){var c=(p.canvasx-e)/u;g+=c*(y.canvasy-p.canvasy)}}}(0===o||a>g)&&(i=p,r=o)}}}var _=this.layout_.setNames[r];return{row:n,seriesName:_,point:i}},t.prototype.mouseMove_=function(t){var e=this.layout_.points;if(void 0!==e&&null!==e){var a=this.eventToDomCoords(t),i=a[0],r=a[1],n=this.getOption("highlightSeriesOpts"),o=!1;if(n&&!this.isSeriesLocked()){var s;s=this.getBooleanOption("stackedGraph")?this.findStackedPoint(i,r):this.findClosestPoint(i,r),o=this.setSelection(s.row,s.seriesName)}else{var l=this.findClosestRow(i);o=this.setSelection(l)}var h=this.getFunctionOption("highlightCallback");h&&o&&h.call(this,t,this.lastx_,this.selPoints_,this.lastRow_,this.highlightSet_)}},t.prototype.getLeftBoundary_=function(t){if(this.boundaryIds_[t])return this.boundaryIds_[t][0];for(var e=0;ee?r:a-r;if(0>=n)return void(this.fadeLevel&&this.updateSelection_(1));var o=++this.animateId,s=this;t.repeatAndCleanup(function(t){s.animateId==o&&(s.fadeLevel+=e,0===s.fadeLevel?s.clearSelection():s.updateSelection_(s.fadeLevel/a))},n,i,function(){})},t.prototype.updateSelection_=function(e){this.cascadeEvents_("select",{selectedRow:this.lastRow_,selectedX:this.lastx_,selectedPoints:this.selPoints_});var a,i=this.canvas_ctx_;if(this.getOption("highlightSeriesOpts")){i.clearRect(0,0,this.width_,this.height_);var r=1-this.getNumericOption("highlightSeriesBackgroundAlpha");if(r){var n=!0;if(n){if(void 0===e)return void this.animateSelection_(1);r*=e}i.fillStyle="rgba(255,255,255,"+r+")",i.fillRect(0,0,this.width_,this.height_)}this.plotter_._renderLineChart(this.highlightSet_,i)}else if(this.previousVerticalX_>=0){var o=0,s=this.attr_("labels");for(a=1;ao&&(o=l)}var h=this.previousVerticalX_;i.clearRect(h-o-1,0,2*o+2,this.height_)}if(this.isUsingExcanvas_&&this.currentZoomRectArgs_&&t.prototype.drawZoomRect_.apply(this,this.currentZoomRectArgs_),this.selPoints_.length>0){var p=this.selPoints_[0].canvasx;for(i.save(),a=0;a=0){t!=this.lastRow_&&(i=!0),this.lastRow_=t;for(var r=0;r=0&&(i=!0),this.lastRow_=-1;return this.selPoints_.length?this.lastx_=this.selPoints_[0].xval:this.lastx_=-1,void 0!==e&&(this.highlightSet_!==e&&(i=!0),this.highlightSet_=e),void 0!==a&&(this.lockedSet_=a),i&&this.updateSelection_(void 0),i},t.prototype.mouseOut_=function(t){this.getFunctionOption("unhighlightCallback")&&this.getFunctionOption("unhighlightCallback").call(this,t),this.getBooleanOption("hideOverlayOnMouseOut")&&!this.lockedSet_&&this.clearSelection()},t.prototype.clearSelection=function(){return this.cascadeEvents_("deselect",{}),this.lockedSet_=!1,this.fadeLevel?void this.animateSelection_(-1):(this.canvas_ctx_.clearRect(0,0,this.width_,this.height_),this.fadeLevel=0,this.selPoints_=[],this.lastx_=-1,this.lastRow_=-1,void(this.highlightSet_=null))},t.prototype.getSelection=function(){if(!this.selPoints_||this.selPoints_.length<1)return-1;for(var t=0;t1&&(a=this.dataHandler_.rollingAverage(a,this.rollPeriod_,this.attributes_)),this.rolledSeries_.push(a)}this.drawGraph_();var i=new Date;this.drawingTimeMs_=i-t},t.PointType=void 0,t.stackPoints_=function(t,e,a,i){for(var r=null,n=null,o=null,s=-1,l=function(e){if(!(s>=e))for(var a=e;aa[1]&&(a[1]=u),u=1;i--)if(this.visibility()[i-1]){if(a){l=e[i];var c=a[0],y=a[1];for(n=null,o=null,r=0;r=c&&null===n&&(n=r),l[r][0]<=y&&(o=r);null===n&&(n=0);for(var _=n,v=!0;v&&_>0;)_--,v=null===l[_][1];null===o&&(o=l.length-1);var f=o;for(v=!0;v&&f0&&(this.setIndexByName_[n[0]]=0);for(var o=0,s=1;s0;){var a=this.readyFns_.pop();a(this)}},t.prototype.computeYAxes_=function(){var e,a,i,r,n;if(void 0!==this.axes_&&this.user_attrs_.hasOwnProperty("valueRange")===!1)for(e=[],i=0;ii;i++)this.axes_[i].valueWindow=e[i]}for(a=0;al;l++){var h=this.axes_[l],p=this.attributes_.getForAxis("logscale",l),g=this.attributes_.getForAxis("includeZero",l),d=this.attributes_.getForAxis("independentTicks",l);if(i=this.attributes_.seriesForAxis(l),e=!0,r=.1,null!==this.getNumericOption("yRangePad")&&(e=!1,r=this.getNumericOption("yRangePad")/this.plotter_.area.h),0===i.length)h.extremeRange=[0,1];else{for(var u,c,y=1/0,_=-(1/0),v=0;v0&&(y=0),0>_&&(_=0)),y==1/0&&(y=0),_==-(1/0)&&(_=1),a=_-y,0===a&&(0!==_?a=Math.abs(_):(_=1,a=1));var f,x;if(p)if(e)f=_+r*a,x=y;else{var m=Math.exp(Math.log(a)*r);f=_*m,x=y/m}else f=_+r*a,x=y-r*a,e&&!this.getBooleanOption("avoidMinZero")&&(0>x&&y>=0&&(x=0),f>0&&0>=_&&(f=0));h.extremeRange=[x,f]}if(h.valueWindow)h.computedValueRange=[h.valueWindow[0],h.valueWindow[1]];else if(h.valueRange){var D=o(h.valueRange[0])?h.extremeRange[0]:h.valueRange[0],w=o(h.valueRange[1])?h.extremeRange[1]:h.valueRange[1];if(!e)if(h.logscale){var m=Math.exp(Math.log(a)*r);D*=m,w/=m}else a=w-D,D-=a*r,w+=a*r;h.computedValueRange=[D,w]}else h.computedValueRange=h.extremeRange;if(d){h.independentTicks=d;var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker");h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this),n||(n=h)}}if(void 0===n)throw'Configuration Error: At least one axis has to have the "independentTicks" option activated.';for(var l=0;s>l;l++){var h=this.axes_[l];if(!h.independentTicks){for(var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker"),T=n.ticks,E=n.computedValueRange[1]-n.computedValueRange[0],C=h.computedValueRange[1]-h.computedValueRange[0],L=[],P=0;P0&&"e"!=t[a-1]&&"E"!=t[a-1]||t.indexOf("/")>=0||isNaN(parseFloat(t))?e=!0:8==t.length&&t>"19700101"&&"20371231">t&&(e=!0),this.setXAxisOptions_(e)},t.prototype.setXAxisOptions_=function(e){e?(this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter):(this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter)},t.prototype.parseCSV_=function(e){var a,i,r=[],n=t.detectLineDelimiter(e),o=e.split(n||"\n"),s=this.getStringOption("delimiter");-1==o[0].indexOf(s)&&o[0].indexOf(" ")>=0&&(s=" ");var l=0;"labels"in this.user_attrs_||(l=1,this.attrs_.labels=o[0].split(s),this.attributes_.reparseSeries());for(var h,p=0,g=!1,d=this.attr_("labels").length,u=!1,c=l;c0&&v[0]0;)e=String.fromCharCode(65+(t-1)%26)+e.toLowerCase(),t=Math.floor((t-1)/26);return e},i=e.getNumberOfColumns(),r=e.getNumberOfRows(),n=e.getColumnType(0);if("date"==n||"datetime"==n)this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;else{if("number"!=n)return console.error("only 'date', 'datetime' and 'number' types are supported for column 1 of DataTable input (Got '"+n+"')"),null;this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter}var o,s,l=[],h={},p=!1;for(o=1;i>o;o++){var g=e.getColumnType(o);if("number"==g)l.push(o);else if("string"==g&&this.getBooleanOption("displayAnnotations")){var d=l[l.length-1];h.hasOwnProperty(d)?h[d].push(o):h[d]=[o],p=!0}else console.error("Only 'number' is supported as a dependent type with Gviz. 'string' is only supported if displayAnnotations is true")}var u=[e.getColumnLabel(0)];for(o=0;oo;o++){var v=[];if("undefined"!=typeof e.getValue(o,0)&&null!==e.getValue(o,0)){if(v.push("date"==n||"datetime"==n?e.getValue(o,0).getTime():e.getValue(o,0)),this.getBooleanOption("errorBars"))for(s=0;i-1>s;s++)v.push([e.getValue(o,1+2*s),e.getValue(o,2+2*s)]);else{for(s=0;s0&&v[0]0&&this.setAnnotations(_,!0),this.attributes_.reparseSeries()},t.prototype.cascadeDataDidUpdateEvent_=function(){this.cascadeEvents_("dataDidUpdate",{})},t.prototype.start_=function(){var e=this.file_;if("function"==typeof e&&(e=e()),t.isArrayLike(e))this.rawData_=this.parseArray_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("object"==typeof e&&"function"==typeof e.getColumnRange)this.parseDataTable_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("string"==typeof e){var a=t.detectLineDelimiter(e);if(a)this.loadedEvent_(e);else{var i;i=window.XMLHttpRequest?new XMLHttpRequest:new ActiveXObject("Microsoft.XMLHTTP");var r=this;i.onreadystatechange=function(){4==i.readyState&&(200===i.status||0===i.status)&&r.loadedEvent_(i.responseText)},i.open("GET",e,!0),i.send(null)}}else console.error("Unknown data format: "+typeof e)},t.prototype.updateOptions=function(e,a){"undefined"==typeof a&&(a=!1);var i=e.file,r=t.mapLegacyOptions_(e);"rollPeriod"in r&&(this.rollPeriod_=r.rollPeriod),"dateWindow"in r&&(this.dateWindow_=r.dateWindow,"isZoomedIgnoreProgrammaticZoom"in r||(this.zoomed_x_=null!==r.dateWindow)),"valueRange"in r&&!("isZoomedIgnoreProgrammaticZoom"in r)&&(this.zoomed_y_=null!==r.valueRange);var n=t.isPixelChangingOptionList(this.attr_("labels"),r);t.updateDeep(this.user_attrs_,r),this.attributes_.reparseSeries(),i?(this.cascadeEvents_("dataWillUpdate",{}),this.file_=i,a||this.start_()):a||(n?this.predraw_():this.renderGraph_(!1))},t.mapLegacyOptions_=function(t){var e={};for(var a in t)t.hasOwnProperty(a)&&"file"!=a&&t.hasOwnProperty(a)&&(e[a]=t[a]);var i=function(t,a,i){e.axes||(e.axes={}),e.axes[t]||(e.axes[t]={}),e.axes[t][a]=i},r=function(a,r,n){"undefined"!=typeof t[a]&&(console.warn("Option "+a+" is deprecated. Use the "+n+" option for the "+r+" axis instead. (e.g. { axes : { "+r+" : { "+n+" : ... } } } (see http://dygraphs.com/per-axis.html for more information."),i(r,n,t[a]),delete e[a])};return r("xValueFormatter","x","valueFormatter"),r("pixelsPerXLabel","x","pixelsPerLabel"),r("xAxisLabelFormatter","x","axisLabelFormatter"),r("xTicker","x","ticker"),r("yValueFormatter","y","valueFormatter"),r("pixelsPerYLabel","y","pixelsPerLabel"),r("yAxisLabelFormatter","y","axisLabelFormatter"),r("yTicker","y","ticker"),r("drawXGrid","x","drawGrid"),r("drawXAxis","x","drawAxis"),r("drawYGrid","y","drawGrid"),r("drawYAxis","y","drawAxis"),r("xAxisLabelWidth","x","axisLabelWidth"),r("yAxisLabelWidth","y","axisLabelWidth"),e},t.prototype.resize=function(t,e){if(!this.resize_lock){this.resize_lock=!0,null===t!=(null===e)&&(console.warn("Dygraph.resize() should be called with zero parameters or two non-NULL parameters. Pretending it was zero."),t=e=null);var a=this.width_,i=this.height_;t?(this.maindiv_.style.width=t+"px",this.maindiv_.style.height=e+"px",this.width_=t,this.height_=e):(this.width_=this.maindiv_.clientWidth,this.height_=this.maindiv_.clientHeight),(a!=this.width_||i!=this.height_)&&(this.resizeElements_(),this.predraw_()),this.resize_lock=!1}},t.prototype.adjustRoll=function(t){this.rollPeriod_=t,this.predraw_()},t.prototype.visibility=function(){for(this.getOption("visibility")||(this.attrs_.visibility=[]);this.getOption("visibility").lengtht||t>=a.length?console.warn("invalid series number in setVisibility: "+t):(a[t]=e,this.predraw_())},t.prototype.size=function(){return{width:this.width_,height:this.height_}},t.prototype.setAnnotations=function(e,a){return t.addAnnotationRule(),this.annotations_=e,this.layout_?(this.layout_.setAnnotations(this.annotations_),void(a||this.predraw_())):void console.warn("Tried to setAnnotations before dygraph was ready. Try setting them in a ready() block. See dygraphs.com/tests/annotation.html")},t.prototype.annotations=function(){return this.annotations_},t.prototype.getLabels=function(){var t=this.attr_("labels");return t?t.slice():null},t.prototype.indexFromSetName=function(t){return this.setIndexByName_[t]},t.prototype.ready=function(t){this.is_initial_draw_?this.readyFns_.push(t):t.call(this,this)},t.addAnnotationRule=function(){if(!t.addedAnnotationCSS){var e="border: 1px solid black; background-color: white; text-align: center;",a=document.createElement("style");a.type="text/css",document.getElementsByTagName("head")[0].appendChild(a);for(var i=0;it?"0"+t:""+t},Dygraph.DateAccessorsLocal={getFullYear:function(t){return t.getFullYear()},getMonth:function(t){return t.getMonth()},getDate:function(t){return t.getDate()},getHours:function(t){return t.getHours()},getMinutes:function(t){return t.getMinutes()},getSeconds:function(t){return t.getSeconds()},getMilliseconds:function(t){return t.getMilliseconds()},getDay:function(t){return t.getDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(t,e,a,i,r,n,o)}},Dygraph.DateAccessorsUTC={getFullYear:function(t){return t.getUTCFullYear()},getMonth:function(t){return t.getUTCMonth()},getDate:function(t){return t.getUTCDate()},getHours:function(t){return t.getUTCHours()},getMinutes:function(t){return t.getUTCMinutes()},getSeconds:function(t){return t.getUTCSeconds()},getMilliseconds:function(t){return t.getUTCMilliseconds()},getDay:function(t){return t.getUTCDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(Date.UTC(t,e,a,i,r,n,o))}},Dygraph.hmsString_=function(t,e,a){var i=Dygraph.zeropad,r=i(t)+":"+i(e);return a&&(r+=":"+i(a)),r},Dygraph.dateString_=function(t,e){var a=Dygraph.zeropad,i=e?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,r=new Date(t),n=i.getFullYear(r),o=i.getMonth(r),s=i.getDate(r),l=i.getHours(r),h=i.getMinutes(r),p=i.getSeconds(r),g=""+n,d=a(o+1),u=a(s),c=3600*l+60*h+p,y=g+"/"+d+"/"+u;return c&&(y+=" "+Dygraph.hmsString_(l,h,p)),y},Dygraph.round_=function(t,e){var a=Math.pow(10,e);return Math.round(t*a)/a},Dygraph.binarySearch=function(t,e,a,i,r){if((null===i||void 0===i||null===r||void 0===r)&&(i=0,r=e.length-1),i>r)return-1;(null===a||void 0===a)&&(a=0);var n,o=function(t){return t>=0&&tt?a>0&&(n=s-1,o(n)&&e[n]l?0>a&&(n=s+1,o(n)&&e[n]>t)?s:Dygraph.binarySearch(t,e,a,s+1,r):-1},Dygraph.dateParser=function(t){var e,a;if((-1==t.search("-")||-1!=t.search("T")||-1!=t.search("Z"))&&(a=Dygraph.dateStrToMillis(t),a&&!isNaN(a)))return a;if(-1!=t.search("-")){for(e=t.replace("-","/","g");-1!=e.search("-");)e=e.replace("-","/");a=Dygraph.dateStrToMillis(e)}else 8==t.length?(e=t.substr(0,4)+"/"+t.substr(4,2)+"/"+t.substr(6,2),a=Dygraph.dateStrToMillis(e)):a=Dygraph.dateStrToMillis(t);return(!a||isNaN(a))&&console.error("Couldn't parse "+t+" as a date"),a},Dygraph.dateStrToMillis=function(t){return new Date(t).getTime()},Dygraph.update=function(t,e){if("undefined"!=typeof e&&null!==e)for(var a in e)e.hasOwnProperty(a)&&(t[a]=e[a]);return t},Dygraph.updateDeep=function(t,e){function a(t){return"object"==typeof Node?t instanceof Node:"object"==typeof t&&"number"==typeof t.nodeType&&"string"==typeof t.nodeName}if("undefined"!=typeof e&&null!==e)for(var i in e)e.hasOwnProperty(i)&&(null===e[i]?t[i]=null:Dygraph.isArrayLike(e[i])?t[i]=e[i].slice():a(e[i])?t[i]=e[i]:"object"==typeof e[i]?(("object"!=typeof t[i]||null===t[i])&&(t[i]={}),Dygraph.updateDeep(t[i],e[i])):t[i]=e[i]);return t},Dygraph.isArrayLike=function(t){var e=typeof t;return"object"!=e&&("function"!=e||"function"!=typeof t.item)||null===t||"number"!=typeof t.length||3===t.nodeType?!1:!0},Dygraph.isDateLike=function(t){return"object"!=typeof t||null===t||"function"!=typeof t.getTime?!1:!0},Dygraph.clone=function(t){for(var e=[],a=0;a=e||Dygraph.requestAnimFrame.call(window,function(){var e=(new Date).getTime(),h=e-o;r=n,n=Math.floor(h/a);var p=n-r,g=n+p>s;g||n>=s?(t(s),i()):(0!==p&&t(n),l())})}()};var e={annotationClickHandler:!0,annotationDblClickHandler:!0,annotationMouseOutHandler:!0,annotationMouseOverHandler:!0,axisLabelColor:!0,axisLineColor:!0,axisLineWidth:!0,clickCallback:!0,drawCallback:!0,drawHighlightPointCallback:!0,drawPoints:!0,drawPointCallback:!0,drawXGrid:!0,drawYGrid:!0,fillAlpha:!0,gridLineColor:!0,gridLineWidth:!0,hideOverlayOnMouseOut:!0,highlightCallback:!0,highlightCircleSize:!0,interactionModel:!0,isZoomedIgnoreProgrammaticZoom:!0,labelsDiv:!0,labelsDivStyles:!0,labelsDivWidth:!0,labelsKMB:!0,labelsKMG2:!0,labelsSeparateLines:!0,labelsShowZeroValues:!0,legend:!0,panEdgeFraction:!0,pixelsPerYLabel:!0,pointClickCallback:!0,pointSize:!0,rangeSelectorPlotFillColor:!0,rangeSelectorPlotStrokeColor:!0,showLabelsOnHighlight:!0,showRoller:!0,strokeWidth:!0,underlayCallback:!0,unhighlightCallback:!0,zoomCallback:!0};Dygraph.isPixelChangingOptionList=function(t,a){var i={};if(t)for(var r=1;re?1/Math.pow(t,-e):Math.pow(t,e)};var a=/^rgba?\((\d{1,3}),\s*(\d{1,3}),\s*(\d{1,3})(?:,\s*([01](?:\.\d+)?))?\)$/;Dygraph.toRGB_=function(e){var a=t(e);if(a)return a;var i=document.createElement("div");i.style.backgroundColor=e,i.style.visibility="hidden",document.body.appendChild(i);var r;return r=window.getComputedStyle?window.getComputedStyle(i,null).backgroundColor:i.currentStyle.backgroundColor,document.body.removeChild(i),t(r)},Dygraph.isCanvasSupported=function(t){var e;try{e=t||document.createElement("canvas"),e.getContext("2d")}catch(a){var i=navigator.appVersion.match(/MSIE (\d\.\d)/),r=-1!=navigator.userAgent.toLowerCase().indexOf("opera");return!i||i[1]<6||r?!1:!0}return!0},Dygraph.parseFloat_=function(t,e,a){var i=parseFloat(t);if(!isNaN(i))return i;if(/^ *$/.test(t))return null;if(/^ *nan *$/i.test(t))return 0/0;var r="Unable to parse '"+t+"' as a number";return void 0!==a&&void 0!==e&&(r+=" on line "+(1+(e||0))+" ('"+a+"') of CSV."),console.error(r),null}}(),function(){"use strict";Dygraph.GVizChart=function(t){this.container=t},Dygraph.GVizChart.prototype.draw=function(t,e){this.container.innerHTML="","undefined"!=typeof this.date_graph&&this.date_graph.destroy(),this.date_graph=new Dygraph(this.container,t,e)},Dygraph.GVizChart.prototype.setSelection=function(t){var e=!1;t.length&&(e=t[0].row),this.date_graph.setSelection(e)},Dygraph.GVizChart.prototype.getSelection=function(){var t=[],e=this.date_graph.getSelection();if(0>e)return t;for(var a=this.date_graph.layout_.points,i=0;ii&&2>r&&void 0!==e.lastx_&&-1!=e.lastx_&&Dygraph.Interaction.treatMouseOpAsClick(e,t,a),a.regionWidth=i,a.regionHeight=r},Dygraph.Interaction.startPan=function(t,e,a){var i,r;a.isPanning=!0;var n=e.xAxisRange();if(e.getOptionForAxis("logscale","x")?(a.initialLeftmostDate=Dygraph.log10(n[0]),a.dateRange=Dygraph.log10(n[1])-Dygraph.log10(n[0])):(a.initialLeftmostDate=n[0],a.dateRange=n[1]-n[0]),a.xUnitsPerPixel=a.dateRange/(e.plotter_.area.w-1),e.getNumericOption("panEdgeFraction")){var o=e.width_*e.getNumericOption("panEdgeFraction"),s=e.xAxisExtremes(),l=e.toDomXCoord(s[0])-o,h=e.toDomXCoord(s[1])+o,p=e.toDataXCoord(l),g=e.toDataXCoord(h);a.boundedDates=[p,g];var d=[],u=e.height_*e.getNumericOption("panEdgeFraction");for(i=0;ia.boundedDates[1]&&(i-=r-a.boundedDates[1],r=i+a.dateRange),e.getOptionForAxis("logscale","x")?e.dateWindow_=[Math.pow(Dygraph.LOG_SCALE,i),Math.pow(Dygraph.LOG_SCALE,r)]:e.dateWindow_=[i,r],a.is2DPan)for(var n=a.dragEndY-a.dragStartY,o=0;oi?Dygraph.VERTICAL:Dygraph.HORIZONTAL,e.drawZoomRect_(a.dragDirection,a.dragStartX,a.dragEndX,a.dragStartY,a.dragEndY,a.prevDragDirection,a.prevEndX,a.prevEndY),a.prevEndX=a.dragEndX,a.prevEndY=a.dragEndY,a.prevDragDirection=a.dragDirection},Dygraph.Interaction.treatMouseOpAsClick=function(t,e,a){for(var i=t.getFunctionOption("clickCallback"),r=t.getFunctionOption("pointClickCallback"),n=null,o=-1,s=Number.MAX_VALUE,l=0;lp)&&(s=p,o=l)}var g=t.getNumericOption("highlightCircleSize")+2;if(g*g>=s&&(n=t.selPoints_[o]),n){var d={cancelable:!0,point:n,canvasx:a.dragEndX,canvasy:a.dragEndY},u=t.cascadeEvents_("pointClick",d);if(u)return;r&&r.call(t,e,n)}var d={cancelable:!0,xval:t.lastx_,pts:t.selPoints_,canvasx:a.dragEndX,canvasy:a.dragEndY};t.cascadeEvents_("click",d)||i&&i.call(t,e,t.lastx_,t.selPoints_)},Dygraph.Interaction.endZoom=function(t,e,a){e.clearZoomRect_(),a.isZooming=!1,Dygraph.Interaction.maybeTreatMouseOpAsClick(t,e,a);var i=e.getArea();if(a.regionWidth>=10&&a.dragDirection==Dygraph.HORIZONTAL){var r=Math.min(a.dragStartX,a.dragEndX),n=Math.max(a.dragStartX,a.dragEndX);r=Math.max(r,i.x),n=Math.min(n,i.x+i.w),n>r&&e.doZoomX_(r,n),a.cancelNextDblclick=!0}else if(a.regionHeight>=10&&a.dragDirection==Dygraph.VERTICAL){var o=Math.min(a.dragStartY,a.dragEndY),s=Math.max(a.dragStartY,a.dragEndY);o=Math.max(o,i.y),s=Math.min(s,i.y+i.h),s>o&&e.doZoomY_(o,s),a.cancelNextDblclick=!0}a.dragStartX=null,a.dragStartY=null},Dygraph.Interaction.startTouch=function(t,e,a){t.preventDefault(),t.touches.length>1&&(a.startTimeForDoubleTapMs=null);for(var i=[],r=0;r=2){a.initialPinchCenter={pageX:.5*(i[0].pageX+i[1].pageX),pageY:.5*(i[0].pageY+i[1].pageY),dataX:.5*(i[0].dataX+i[1].dataX),dataY:.5*(i[0].dataY+i[1].dataY)};var o=180/Math.PI*Math.atan2(a.initialPinchCenter.pageY-i[0].pageY,i[0].pageX-a.initialPinchCenter.pageX);o=Math.abs(o),o>90&&(o=90-o),a.touchDirections={x:67.5>o,y:o>22.5}}a.initialRange={x:e.xAxisRange(),y:e.yAxisRange()}},Dygraph.Interaction.moveTouch=function(t,e,a){a.startTimeForDoubleTapMs=null;var i,r=[];for(i=0;i=2){var c=s[1].pageX-l.pageX;d=(r[1].pageX-o.pageX)/c;var y=s[1].pageY-l.pageY;u=(r[1].pageY-o.pageY)/y}d=Math.min(8,Math.max(.125,d)),u=Math.min(8,Math.max(.125,u));var _=!1;if(a.touchDirections.x&&(e.dateWindow_=[l.dataX-h.dataX+(a.initialRange.x[0]-l.dataX)/d,l.dataX-h.dataX+(a.initialRange.x[1]-l.dataX)/d],_=!0),a.touchDirections.y)for(i=0;1>i;i++){var v=e.axes_[i],f=e.attributes_.getForAxis("logscale",i);f||(v.valueWindow=[l.dataY-h.dataY+(a.initialRange.y[0]-l.dataY)/u,l.dataY-h.dataY+(a.initialRange.y[1]-l.dataY)/u],_=!0)}if(e.drawGraph_(!1),_&&r.length>1&&e.getFunctionOption("zoomCallback")){var x=e.xAxisRange();e.getFunctionOption("zoomCallback").call(e,x[0],x[1],e.yAxisRanges())}},Dygraph.Interaction.endTouch=function(t,e,a){if(0!==t.touches.length)Dygraph.Interaction.startTouch(t,e,a);else if(1==t.changedTouches.length){var i=(new Date).getTime(),r=t.changedTouches[0];a.startTimeForDoubleTapMs&&i-a.startTimeForDoubleTapMs<500&&a.doubleTapX&&Math.abs(a.doubleTapX-r.screenX)<50&&a.doubleTapY&&Math.abs(a.doubleTapY-r.screenY)<50?e.resetZoom():(a.startTimeForDoubleTapMs=i,a.doubleTapX=r.screenX,a.doubleTapY=r.screenY)}};var e=function(t,e,a){return e>t?e-t:t>a?t-a:0},a=function(t,a){var i=Dygraph.findPos(a.canvas_),r={left:i.x,right:i.x+a.canvas_.offsetWidth,top:i.y,bottom:i.y+a.canvas_.offsetHeight},n={x:Dygraph.pageX(t),y:Dygraph.pageY(t)},o=e(n.x,r.left,r.right),s=e(n.y,r.top,r.bottom);return Math.max(o,s)};Dygraph.Interaction.defaultModel={mousedown:function(e,i,r){if(!e.button||2!=e.button){r.initializeMouseDown(e,i,r),e.altKey||e.shiftKey?Dygraph.startPan(e,i,r):Dygraph.startZoom(e,i,r);var n=function(e){if(r.isZooming){var n=a(e,i);t>n?Dygraph.moveZoom(e,i,r):null!==r.dragEndX&&(r.dragEndX=null,r.dragEndY=null,i.clearZoomRect_())}else r.isPanning&&Dygraph.movePan(e,i,r)},o=function(t){r.isZooming?null!==r.dragEndX?Dygraph.endZoom(t,i,r):Dygraph.Interaction.maybeTreatMouseOpAsClick(t,i,r):r.isPanning&&Dygraph.endPan(t,i,r),Dygraph.removeEvent(document,"mousemove",n),Dygraph.removeEvent(document,"mouseup",o),r.destroy()};i.addAndTrackEvent(document,"mousemove",n),i.addAndTrackEvent(document,"mouseup",o)}},willDestroyContextMyself:!0,touchstart:function(t,e,a){Dygraph.Interaction.startTouch(t,e,a)},touchmove:function(t,e,a){Dygraph.Interaction.moveTouch(t,e,a)},touchend:function(t,e,a){Dygraph.Interaction.endTouch(t,e,a)},dblclick:function(t,e,a){if(a.cancelNextDblclick)return void(a.cancelNextDblclick=!1);var i={canvasx:a.dragEndX,canvasy:a.dragEndY};e.cascadeEvents_("dblclick",i)||t.altKey||t.shiftKey||e.resetZoom()}},Dygraph.DEFAULT_ATTRS.interactionModel=Dygraph.Interaction.defaultModel,Dygraph.defaultInteractionModel=Dygraph.Interaction.defaultModel,Dygraph.endZoom=Dygraph.Interaction.endZoom,Dygraph.moveZoom=Dygraph.Interaction.moveZoom,Dygraph.startZoom=Dygraph.Interaction.startZoom,Dygraph.endPan=Dygraph.Interaction.endPan,Dygraph.movePan=Dygraph.Interaction.movePan,Dygraph.startPan=Dygraph.Interaction.startPan,Dygraph.Interaction.nonInteractiveModel_={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a)},mouseup:Dygraph.Interaction.maybeTreatMouseOpAsClick},Dygraph.Interaction.dragIsPanInteractionModel={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a),Dygraph.startPan(t,e,a)},mousemove:function(t,e,a){a.isPanning&&Dygraph.movePan(t,e,a)},mouseup:function(t,e,a){a.isPanning&&Dygraph.endPan(t,e,a)}}}(),function(){"use strict";Dygraph.TickList=void 0,Dygraph.Ticker=void 0,Dygraph.numericLinearTicks=function(t,e,a,i,r,n){var o=function(t){return"logscale"===t?!1:i(t)};return Dygraph.numericTicks(t,e,a,o,r,n)},Dygraph.numericTicks=function(t,e,a,i,r,n){var o,s,l,h,p=i("pixelsPerLabel"),g=[];if(n)for(o=0;o=h/4){for(var y=u;y>=d;y--){var _=Dygraph.PREFERRED_LOG_TICK_VALUES[y],v=Math.log(_/t)/Math.log(e/t)*a,f={v:_};null===c?c={tickValue:_,pixel_coord:v}:Math.abs(v-c.pixel_coord)>=p?c={tickValue:_,pixel_coord:v}:f.label="",g.push(f)}g.reverse()}}if(0===g.length){var x,m,D=i("labelsKMG2");D?(x=[1,2,4,8,16,32,64,128,256],m=16):(x=[1,2,5,10,20,50,100],m=10);var w,A,b,T,E=Math.ceil(a/p),C=Math.abs(e-t)/E,L=Math.floor(Math.log(C)/Math.log(m)),P=Math.pow(m,L);for(s=0;sp));s++);for(A>b&&(w*=-1),o=0;h>=o;o++)l=A+o*w,g.push({v:l})}}var S=i("axisLabelFormatter");for(o=0;o=0?Dygraph.getDateAxis(t,e,o,i,r):[]},Dygraph.SECONDLY=0,Dygraph.TWO_SECONDLY=1,Dygraph.FIVE_SECONDLY=2,Dygraph.TEN_SECONDLY=3,Dygraph.THIRTY_SECONDLY=4,Dygraph.MINUTELY=5,Dygraph.TWO_MINUTELY=6,Dygraph.FIVE_MINUTELY=7,Dygraph.TEN_MINUTELY=8,Dygraph.THIRTY_MINUTELY=9,Dygraph.HOURLY=10,Dygraph.TWO_HOURLY=11,Dygraph.SIX_HOURLY=12,Dygraph.DAILY=13,Dygraph.TWO_DAILY=14,Dygraph.WEEKLY=15,Dygraph.MONTHLY=16,Dygraph.QUARTERLY=17,Dygraph.BIANNUAL=18,Dygraph.ANNUAL=19,Dygraph.DECADAL=20,Dygraph.CENTENNIAL=21,Dygraph.NUM_GRANULARITIES=22,Dygraph.DATEFIELD_Y=0,Dygraph.DATEFIELD_M=1,Dygraph.DATEFIELD_D=2,Dygraph.DATEFIELD_HH=3,Dygraph.DATEFIELD_MM=4,Dygraph.DATEFIELD_SS=5,Dygraph.DATEFIELD_MS=6,Dygraph.NUM_DATEFIELDS=7,Dygraph.TICK_PLACEMENT=[],Dygraph.TICK_PLACEMENT[Dygraph.SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:1,spacing:1e3},Dygraph.TICK_PLACEMENT[Dygraph.TWO_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:2,spacing:2e3},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:5,spacing:5e3},Dygraph.TICK_PLACEMENT[Dygraph.TEN_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:10,spacing:1e4},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:30,spacing:3e4},Dygraph.TICK_PLACEMENT[Dygraph.MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:1,spacing:6e4},Dygraph.TICK_PLACEMENT[Dygraph.TWO_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:2,spacing:12e4},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:5,spacing:3e5},Dygraph.TICK_PLACEMENT[Dygraph.TEN_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:10,spacing:6e5},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:30,spacing:18e5},Dygraph.TICK_PLACEMENT[Dygraph.HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:1,spacing:36e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:2,spacing:72e5},Dygraph.TICK_PLACEMENT[Dygraph.SIX_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:6,spacing:216e5},Dygraph.TICK_PLACEMENT[Dygraph.DAILY]={datefield:Dygraph.DATEFIELD_D,step:1,spacing:864e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_DAILY]={datefield:Dygraph.DATEFIELD_D,step:2,spacing:1728e5},Dygraph.TICK_PLACEMENT[Dygraph.WEEKLY]={datefield:Dygraph.DATEFIELD_D,step:7,spacing:6048e5},Dygraph.TICK_PLACEMENT[Dygraph.MONTHLY]={datefield:Dygraph.DATEFIELD_M,step:1,spacing:2629817280},Dygraph.TICK_PLACEMENT[Dygraph.QUARTERLY]={datefield:Dygraph.DATEFIELD_M,step:3,spacing:216e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.BIANNUAL]={datefield:Dygraph.DATEFIELD_M,step:6,spacing:432e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.ANNUAL]={datefield:Dygraph.DATEFIELD_Y,step:1,spacing:864e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.DECADAL]={datefield:Dygraph.DATEFIELD_Y,step:10,spacing:315578073600},Dygraph.TICK_PLACEMENT[Dygraph.CENTENNIAL]={datefield:Dygraph.DATEFIELD_Y,step:100,spacing:3155780736e3},Dygraph.PREFERRED_LOG_TICK_VALUES=function(){for(var t=[],e=-39;39>=e;e++)for(var a=Math.pow(10,e),i=1;9>=i;i++){var r=a*i;t.push(r)}return t}(),Dygraph.pickDateTickGranularity=function(t,e,a,i){for(var r=i("pixelsPerLabel"),n=0;n=r)return n}return-1},Dygraph.numDateTicks=function(t,e,a){var i=Dygraph.TICK_PLACEMENT[a].spacing;return Math.round(1*(e-t)/i)},Dygraph.getDateAxis=function(t,e,a,i,r){var n=i("axisLabelFormatter"),o=i("labelsUTC"),s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,l=Dygraph.TICK_PLACEMENT[a].datefield,h=Dygraph.TICK_PLACEMENT[a].step,p=Dygraph.TICK_PLACEMENT[a].spacing,g=new Date(t),d=[];d[Dygraph.DATEFIELD_Y]=s.getFullYear(g),d[Dygraph.DATEFIELD_M]=s.getMonth(g),d[Dygraph.DATEFIELD_D]=s.getDate(g),d[Dygraph.DATEFIELD_HH]=s.getHours(g),d[Dygraph.DATEFIELD_MM]=s.getMinutes(g),d[Dygraph.DATEFIELD_SS]=s.getSeconds(g),d[Dygraph.DATEFIELD_MS]=s.getMilliseconds(g);var u=d[l]%h;a==Dygraph.WEEKLY&&(u=s.getDay(g)),d[l]-=u;for(var c=l+1;cv&&(v+=p,_=new Date(v));e>=v;)y.push({v:v,label:n.call(r,_,a,i,r)}),v+=p,_=new Date(v);else for(t>v&&(d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime());e>=v;)(a>=Dygraph.DAILY||s.getHours(_)%h===0)&&y.push({v:v,label:n.call(r,_,a,i,r)}),d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime();return y},Dygraph&&Dygraph.DEFAULT_ATTRS&&Dygraph.DEFAULT_ATTRS.axes&&Dygraph.DEFAULT_ATTRS.axes.x&&Dygraph.DEFAULT_ATTRS.axes.y&&Dygraph.DEFAULT_ATTRS.axes.y2&&(Dygraph.DEFAULT_ATTRS.axes.x.ticker=Dygraph.dateTicker,Dygraph.DEFAULT_ATTRS.axes.y.ticker=Dygraph.numericTicks,Dygraph.DEFAULT_ATTRS.axes.y2.ticker=Dygraph.numericTicks)}(),Dygraph.Plugins={},Dygraph.Plugins.Annotations=function(){"use strict";var t=function(){this.annotations_=[]};return t.prototype.toString=function(){return"Annotations Plugin"},t.prototype.activate=function(t){return{clearChart:this.clearChart,didDrawChart:this.didDrawChart}},t.prototype.detachLabels=function(){for(var t=0;to.x+o.w||h.canvasyo.y+o.h)){var p=h.annotation,g=6;p.hasOwnProperty("tickHeight")&&(g=p.tickHeight);var d=document.createElement("div");for(var u in r)r.hasOwnProperty(u)&&(d.style[u]=r[u]);p.hasOwnProperty("icon")||(d.className="dygraphDefaultAnnotation"),p.hasOwnProperty("cssClass")&&(d.className+=" "+p.cssClass);var c=p.hasOwnProperty("width")?p.width:16,y=p.hasOwnProperty("height")?p.height:16;if(p.hasOwnProperty("icon")){var _=document.createElement("img");_.src=p.icon,_.width=c,_.height=y,d.appendChild(_)}else h.annotation.hasOwnProperty("shortText")&&d.appendChild(document.createTextNode(h.annotation.shortText));var v=h.canvasx-c/2;d.style.left=v+"px";var f=0;if(p.attachAtBottom){var x=o.y+o.h-y-g;s[v]?x-=s[v]:s[v]=0,s[v]+=g+y,f=x}else f=h.canvasy-y-g;d.style.top=f+"px",d.style.width=c+"px",d.style.height=y+"px",d.title=h.annotation.text,d.style.color=e.colorsMap_[h.name],d.style.borderColor=e.colorsMap_[h.name],p.div=d,e.addAndTrackEvent(d,"click",n("clickHandler","annotationClickHandler",h,this)),e.addAndTrackEvent(d,"mouseover",n("mouseOverHandler","annotationMouseOverHandler",h,this)),e.addAndTrackEvent(d,"mouseout",n("mouseOutHandler","annotationMouseOutHandler",h,this)),e.addAndTrackEvent(d,"dblclick",n("dblClickHandler","annotationDblClickHandler",h,this)),i.appendChild(d),this.annotations_.push(d);var m=t.drawingContext;if(m.save(),m.strokeStyle=e.colorsMap_[h.name],m.beginPath(),p.attachAtBottom){var x=f+y;m.moveTo(h.canvasx,x),m.lineTo(h.canvasx,x+g)}else m.moveTo(h.canvasx,h.canvasy),m.lineTo(h.canvasx,h.canvasy-2-g);m.closePath(),m.stroke(),m.restore()}}},t.prototype.destroy=function(){this.detachLabels()},t}(),Dygraph.Plugins.Axes=function(){"use strict";var t=function(){this.xlabels_=[],this.ylabels_=[]};return t.prototype.toString=function(){return"Axes Plugin"},t.prototype.activate=function(t){return{layout:this.layout,clearChart:this.clearChart,willDrawChart:this.willDrawChart}},t.prototype.layout=function(t){var e=t.dygraph;if(e.getOptionForAxis("drawAxis","y")){var a=e.getOptionForAxis("axisLabelWidth","y")+2*e.getOptionForAxis("axisTickSize","y");t.reserveSpaceLeft(a)}if(e.getOptionForAxis("drawAxis","x")){var i;i=e.getOption("xAxisHeight")?e.getOption("xAxisHeight"):e.getOptionForAxis("axisLabelFontSize","x")+2*e.getOptionForAxis("axisTickSize","x"),t.reserveSpaceBottom(i)}if(2==e.numAxes()){if(e.getOptionForAxis("drawAxis","y2")){var a=e.getOptionForAxis("axisLabelWidth","y2")+2*e.getOptionForAxis("axisTickSize","y2");t.reserveSpaceRight(a)}}else e.numAxes()>2&&e.error("Only two y-axes are supported at this time. (Trying to use "+e.numAxes()+")")},t.prototype.detachLabels=function(){function t(t){for(var e=0;e0){var x=i.numAxes(),m=[f("y"),f("y2")];for(l=0;l<_.yticks.length;l++){if(s=_.yticks[l],"function"==typeof s)return;n=v.x;var D=1,w="y1",A=m[0];1==s[0]&&(n=v.x+v.w,D=-1,w="y2",A=m[1]);var b=A("axisLabelFontSize");o=v.y+s[1]*v.h,r=y(s[2],"y",2==x?w:null);var T=o-b/2;0>T&&(T=0),T+b+3>d?r.style.bottom="0":r.style.top=T+"px",0===s[0]?(r.style.left=v.x-A("axisLabelWidth")-A("axisTickSize")+"px",r.style.textAlign="right"):1==s[0]&&(r.style.left=v.x+v.w+A("axisTickSize")+"px",r.style.textAlign="left"),r.style.width=A("axisLabelWidth")+"px",p.appendChild(r),this.ylabels_.push(r)}var E=this.ylabels_[0],b=i.getOptionForAxis("axisLabelFontSize","y"),C=parseInt(E.style.top,10)+b;C>d-b&&(E.style.top=parseInt(E.style.top,10)-b/2+"px")}var L;if(i.getOption("drawAxesAtZero")){var P=i.toPercentXCoord(0);(P>1||0>P||isNaN(P))&&(P=0),L=e(v.x+P*v.w)}else L=e(v.x);h.strokeStyle=i.getOptionForAxis("axisLineColor","y"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y"),h.beginPath(),h.moveTo(L,a(v.y)),h.lineTo(L,a(v.y+v.h)),h.closePath(),h.stroke(),2==i.numAxes()&&(h.strokeStyle=i.getOptionForAxis("axisLineColor","y2"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y2"),h.beginPath(),h.moveTo(a(v.x+v.w),a(v.y)),h.lineTo(a(v.x+v.w),a(v.y+v.h)),h.closePath(),h.stroke())}if(i.getOptionForAxis("drawAxis","x")){if(_.xticks){var A=f("x");for(l=0;l<_.xticks.length;l++){s=_.xticks[l],n=v.x+s[0]*v.w,o=v.y+v.h,r=y(s[1],"x"),r.style.textAlign="center",r.style.top=o+A("axisTickSize")+"px";var S=n-A("axisLabelWidth")/2;S+A("axisLabelWidth")>g&&(S=g-A("axisLabelWidth"),r.style.textAlign="right"),0>S&&(S=0,r.style.textAlign="left"),r.style.left=S+"px",r.style.width=A("axisLabelWidth")+"px", +p.appendChild(r),this.xlabels_.push(r)}}h.strokeStyle=i.getOptionForAxis("axisLineColor","x"),h.lineWidth=i.getOptionForAxis("axisLineWidth","x"),h.beginPath();var O;if(i.getOption("drawAxesAtZero")){var P=i.toPercentYCoord(0,0);(P>1||0>P)&&(P=1),O=a(v.y+P*v.h)}else O=a(v.y+v.h);h.moveTo(e(v.x),O),h.lineTo(e(v.x+v.w),O),h.closePath(),h.stroke()}h.restore()}},t}(),Dygraph.Plugins.ChartLabels=function(){"use strict";var t=function(){this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};t.prototype.toString=function(){return"ChartLabels Plugin"},t.prototype.activate=function(t){return{layout:this.layout,didDrawChart:this.didDrawChart}};var e=function(t){var e=document.createElement("div");return e.style.position="absolute",e.style.left=t.x+"px",e.style.top=t.y+"px",e.style.width=t.w+"px",e.style.height=t.h+"px",e};t.prototype.detachLabels_=function(){for(var t=[this.title_div_,this.xlabel_div_,this.ylabel_div_,this.y2label_div_],e=0;e=2);for(o=h.yticks,l.save(),n=0;n=2;for(y&&l.installPattern(_),l.strokeStyle=s.getOptionForAxis("gridLineColor","x"),l.lineWidth=s.getOptionForAxis("gridLineWidth","x"),n=0;n/g,">")};return t.prototype.select=function(e){var a=e.selectedX,i=e.selectedPoints,r=e.selectedRow,n=e.dygraph.getOption("legend");if("never"===n)return void(this.legend_div_.style.display="none");if("follow"===n){var o=e.dygraph.plotter_.area,s=e.dygraph.getOption("labelsDivWidth"),l=e.dygraph.getOptionForAxis("axisLabelWidth","y"),h=i[0].x*o.w+20,p=i[0].y*o.h-20;h+s+1>window.scrollX+window.innerWidth&&(h=h-40-s-(l-o.x)),e.dygraph.graphDiv.appendChild(this.legend_div_),this.legend_div_.style.left=l+h+"px",this.legend_div_.style.top=p+"px"}var g=t.generateLegendHTML(e.dygraph,a,i,this.one_em_width_,r);this.legend_div_.innerHTML=g,this.legend_div_.style.display=""},t.prototype.deselect=function(e){var i=e.dygraph.getOption("legend");"always"!==i&&(this.legend_div_.style.display="none");var r=a(this.legend_div_);this.one_em_width_=r;var n=t.generateLegendHTML(e.dygraph,void 0,void 0,r,null);this.legend_div_.innerHTML=n},t.prototype.didDrawChart=function(t){this.deselect(t)},t.prototype.predraw=function(t){if(this.is_generated_div_){t.dygraph.graphDiv.appendChild(this.legend_div_);var e=t.dygraph.plotter_.area,a=t.dygraph.getOption("labelsDivWidth");this.legend_div_.style.left=e.x+e.w-a-1+"px",this.legend_div_.style.top=e.y+"px",this.legend_div_.style.width=a+"px"}},t.prototype.destroy=function(){this.legend_div_=null},t.generateLegendHTML=function(t,a,r,n,o){if(t.getOption("showLabelsOnHighlight")!==!0)return"";var s,l,h,p,g,d=t.getLabels();if("undefined"==typeof a){if("always"!=t.getOption("legend"))return"";for(l=t.getOption("labelsSeparateLines"),s="",h=1;h":" "),g=t.getOption("strokePattern",d[h]),p=e(g,u.color,n),s+=""+p+" "+i(d[h])+"")}return s}var c=t.optionsViewForAxis_("x"),y=c("valueFormatter");s=y.call(t,a,c,d[0],t,o,0),""!==s&&(s+=":");var _=[],v=t.numAxes();for(h=0;v>h;h++)_[h]=t.optionsViewForAxis_("y"+(h?1+h:""));var f=t.getOption("labelsShowZeroValues");l=t.getOption("labelsSeparateLines");var x=t.getHighlightSeries();for(h=0;h");var u=t.getPropertiesForSeries(m.name),D=_[u.axis-1],w=D("valueFormatter"),A=w.call(t,m.yval,D,m.name,t,o,d.indexOf(m.name)),b=m.name==x?" class='highlight'":"";s+=" "+i(m.name)+": "+A+""}}return s},e=function(t,e,a){var i=/MSIE/.test(navigator.userAgent)&&!window.opera;if(i)return"—";if(!t||t.length<=1)return'
      ';var r,n,o,s,l,h=0,p=0,g=[];for(r=0;r<=t.length;r++)h+=t[r%t.length];if(l=Math.floor(a/(h-t[0])),l>1){for(r=0;rn;n++)for(r=0;p>r;r+=2)o=g[r%g.length],s=r';return d},t}(),Dygraph.Plugins.RangeSelector=function(){"use strict";var t=function(){this.isIE_=/MSIE/.test(navigator.userAgent)&&!window.opera,this.hasTouchInterface_="undefined"!=typeof TouchEvent,this.isMobileDevice_=/mobile|android/gi.test(navigator.appVersion),this.interfaceCreated_=!1};return t.prototype.toString=function(){return"RangeSelector Plugin"},t.prototype.activate=function(t){return this.dygraph_=t,this.isUsingExcanvas_=t.isUsingExcanvas_,this.getOption_("showRangeSelector")&&this.createInterface_(),{layout:this.reserveSpace_,predraw:this.renderStaticLayer_,didDrawChart:this.renderInteractiveLayer_}},t.prototype.destroy=function(){this.bgcanvas_=null,this.fgcanvas_=null,this.leftZoomHandle_=null,this.rightZoomHandle_=null,this.iePanOverlay_=null},t.prototype.getOption_=function(t,e){return this.dygraph_.getOption(t,e)},t.prototype.setDefaultOption_=function(t,e){this.dygraph_.attrs_[t]=e},t.prototype.createInterface_=function(){this.createCanvases_(),this.isUsingExcanvas_&&this.createIEPanOverlay_(),this.createZoomHandles_(),this.initInteraction_(),this.getOption_("animatedZooms")&&(console.warn("Animated zooms and range selector are not compatible; disabling animatedZooms."),this.dygraph_.updateOptions({animatedZooms:!1},!0)),this.interfaceCreated_=!0,this.addToGraph_()},t.prototype.addToGraph_=function(){var t=this.graphDiv_=this.dygraph_.graphDiv;t.appendChild(this.bgcanvas_),t.appendChild(this.fgcanvas_),t.appendChild(this.leftZoomHandle_),t.appendChild(this.rightZoomHandle_)},t.prototype.removeFromGraph_=function(){var t=this.graphDiv_;t.removeChild(this.bgcanvas_),t.removeChild(this.fgcanvas_),t.removeChild(this.leftZoomHandle_),t.removeChild(this.rightZoomHandle_),this.graphDiv_=null},t.prototype.reserveSpace_=function(t){this.getOption_("showRangeSelector")&&t.reserveSpaceBottom(this.getOption_("rangeSelectorHeight")+4)},t.prototype.renderStaticLayer_=function(){this.updateVisibility_()&&(this.resize_(),this.drawStaticLayer_())},t.prototype.renderInteractiveLayer_=function(){this.updateVisibility_()&&!this.isChangingRange_&&(this.placeZoomHandles_(),this.drawInteractiveLayer_())},t.prototype.updateVisibility_=function(){var t=this.getOption_("showRangeSelector");if(t)this.interfaceCreated_?this.graphDiv_&&this.graphDiv_.parentNode||this.addToGraph_():this.createInterface_();else if(this.graphDiv_){this.removeFromGraph_();var e=this.dygraph_;setTimeout(function(){e.width_=0,e.resize()},1)}return t},t.prototype.resize_=function(){function t(t,e,a){var i=Dygraph.getContextPixelRatio(e);t.style.top=a.y+"px",t.style.left=a.x+"px",t.width=a.w*i,t.height=a.h*i,t.style.width=a.w+"px",t.style.height=a.h+"px",1!=i&&e.scale(i,i)}var e=this.dygraph_.layout_.getPlotArea(),a=0;this.dygraph_.getOptionForAxis("drawAxis","x")&&(a=this.getOption_("xAxisHeight")||this.getOption_("axisLabelFontSize")+2*this.getOption_("axisTickSize")),this.canvasRect_={x:e.x,y:e.y+e.h+a+4,w:e.w,h:this.getOption_("rangeSelectorHeight")},t(this.bgcanvas_,this.bgcanvas_ctx_,this.canvasRect_),t(this.fgcanvas_,this.fgcanvas_ctx_,this.canvasRect_)},t.prototype.createCanvases_=function(){this.bgcanvas_=Dygraph.createCanvas(),this.bgcanvas_.className="dygraph-rangesel-bgcanvas",this.bgcanvas_.style.position="absolute",this.bgcanvas_.style.zIndex=9,this.bgcanvas_ctx_=Dygraph.getContext(this.bgcanvas_),this.fgcanvas_=Dygraph.createCanvas(),this.fgcanvas_.className="dygraph-rangesel-fgcanvas",this.fgcanvas_.style.position="absolute",this.fgcanvas_.style.zIndex=9,this.fgcanvas_.style.cursor="default",this.fgcanvas_ctx_=Dygraph.getContext(this.fgcanvas_)},t.prototype.createIEPanOverlay_=function(){this.iePanOverlay_=document.createElement("div"),this.iePanOverlay_.style.position="absolute",this.iePanOverlay_.style.backgroundColor="white",this.iePanOverlay_.style.filter="alpha(opacity=0)",this.iePanOverlay_.style.display="none",this.iePanOverlay_.style.cursor="move",this.fgcanvas_.appendChild(this.iePanOverlay_)},t.prototype.createZoomHandles_=function(){var t=new Image;t.className="dygraph-rangesel-zoomhandle",t.style.position="absolute",t.style.zIndex=10,t.style.visibility="hidden",t.style.cursor="col-resize",/MSIE 7/.test(navigator.userAgent)?(t.width=7,t.height=14,t.style.backgroundColor="white",t.style.border="1px solid #333333"):(t.width=9,t.height=16,t.src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAQCAYAAADESFVDAAAAAXNSR0IArs4c6QAAAAZiS0dEANAAzwDP4Z7KegAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAAd0SU1FB9sHGw0cMqdt1UwAAAAZdEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAAaElEQVQoz+3SsRFAQBCF4Z9WJM8KCDVwownl6YXsTmCUsyKGkZzcl7zkz3YLkypgAnreFmDEpHkIwVOMfpdi9CEEN2nGpFdwD03yEqDtOgCaun7sqSTDH32I1pQA2Pb9sZecAxc5r3IAb21d6878xsAAAAAASUVORK5CYII="),this.isMobileDevice_&&(t.width*=2,t.height*=2),this.leftZoomHandle_=t,this.rightZoomHandle_=t.cloneNode(!1)},t.prototype.initInteraction_=function(){var t,e,a,i,r,n,o,s,l,h,p,g,d,u,c=this,y=document,_=0,v=null,f=!1,x=!1,m=!this.isMobileDevice_&&!this.isUsingExcanvas_,D=new Dygraph.IFrameTarp;t=function(t){var e=c.dygraph_.xAxisExtremes(),a=(e[1]-e[0])/c.canvasRect_.w,i=e[0]+(t.leftHandlePos-c.canvasRect_.x)*a,r=e[0]+(t.rightHandlePos-c.canvasRect_.x)*a;return[i,r]},e=function(t){return Dygraph.cancelEvent(t),f=!0,_=t.clientX,v=t.target?t.target:t.srcElement,("mousedown"===t.type||"dragstart"===t.type)&&(Dygraph.addEvent(y,"mousemove",a),Dygraph.addEvent(y,"mouseup",i)),c.fgcanvas_.style.cursor="col-resize",D.cover(),!0},a=function(t){if(!f)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a,i=c.getZoomHandleStatus_();v==c.leftZoomHandle_?(a=i.leftHandlePos+e,a=Math.min(a,i.rightHandlePos-v.width-3),a=Math.max(a,c.canvasRect_.x)):(a=i.rightHandlePos+e,a=Math.min(a,c.canvasRect_.x+c.canvasRect_.w),a=Math.max(a,i.leftHandlePos+v.width+3));var n=v.width/2;return v.style.left=a-n+"px",c.drawInteractiveLayer_(),m&&r(),!0},i=function(t){return f?(f=!1,D.uncover(),Dygraph.removeEvent(y,"mousemove",a),Dygraph.removeEvent(y,"mouseup",i),c.fgcanvas_.style.cursor="default",m||r(),!0):!1},r=function(){try{var e=c.getZoomHandleStatus_();if(c.isChangingRange_=!0,e.isZoomed){var a=t(e);c.dygraph_.doZoomXDates_(a[0],a[1])}else c.dygraph_.resetZoom()}finally{c.isChangingRange_=!1}},n=function(t){if(c.isUsingExcanvas_)return t.srcElement==c.iePanOverlay_;var e=c.leftZoomHandle_.getBoundingClientRect(),a=e.left+e.width/2;e=c.rightZoomHandle_.getBoundingClientRect();var i=e.left+e.width/2;return t.clientX>a&&t.clientX=c.canvasRect_.x+c.canvasRect_.w?(r=c.canvasRect_.x+c.canvasRect_.w,i=r-n):(i+=e,r+=e);var o=c.leftZoomHandle_.width/2;return c.leftZoomHandle_.style.left=i-o+"px",c.rightZoomHandle_.style.left=r-o+"px",c.drawInteractiveLayer_(),m&&h(),!0},l=function(t){return x?(x=!1,Dygraph.removeEvent(y,"mousemove",s),Dygraph.removeEvent(y,"mouseup",l),m||h(),!0):!1},h=function(){try{c.isChangingRange_=!0,c.dygraph_.dateWindow_=t(c.getZoomHandleStatus_()),c.dygraph_.drawGraph_(!1)}finally{c.isChangingRange_=!1}},p=function(t){if(!f&&!x){var e=n(t)?"move":"default";e!=c.fgcanvas_.style.cursor&&(c.fgcanvas_.style.cursor=e)}},g=function(t){"touchstart"==t.type&&1==t.targetTouches.length?e(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?a(t.targetTouches[0])&&Dygraph.cancelEvent(t):i(t)},d=function(t){"touchstart"==t.type&&1==t.targetTouches.length?o(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?s(t.targetTouches[0])&&Dygraph.cancelEvent(t):l(t)},u=function(t,e){for(var a=["touchstart","touchend","touchmove","touchcancel"],i=0;it;t++){var s=this.getOption_("showInRangeSelector",r[t]);n[t]=s,null!==s&&(o=!0)}if(!o)for(t=0;t1&&(g=h.rollingAverage(g,e.rollPeriod(),p)),l.push(g)}var d=[];for(t=0;t0)&&(v=Math.min(v,x),f=Math.max(f,x))}var m=.25;if(a)for(f=Dygraph.log10(f),f+=f*m,v=Dygraph.log10(v),t=0;tthis.canvasRect_.x||a+10&&t[r][0]>o;)i--,r--}return i>=a?[a,i]:[0,t.length-1]},t.parseFloat=function(t){return null===t?0/0:t}}(),function(){"use strict";Dygraph.DataHandlers.DefaultHandler=function(){};var t=Dygraph.DataHandlers.DefaultHandler;t.prototype=new Dygraph.DataHandler,t.prototype.extractSeries=function(t,e,a){for(var i=[],r=a.get("logscale"),n=0;n=s&&(s=null),i.push([o,s])}return i},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l=[];if(1==e)return t;for(i=0;ir;r++)n=t[r][1],null===n||isNaN(n)||(s++,o+=t[r][1]);s?l[i]=[t[i][0],o/s]:l[i]=[t[i][0],null]}return l},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)i=t[l][1],null===i||isNaN(i)||((null===n||i>n)&&(n=i),(null===r||r>i)&&(r=i));return[r,n]}}(),function(){"use strict";Dygraph.DataHandlers.DefaultFractionHandler=function(){};var t=Dygraph.DataHandlers.DefaultFractionHandler;t.prototype=new Dygraph.DataHandlers.DefaultHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h=[],p=100,g=a.get("logscale"),d=0;d=0&&(n-=t[i-e][2][0],o-=t[i-e][2][1]);var l=t[i][0],h=o?n/o:0;r[i]=[l,s*h]}return r}}(),function(){"use strict";Dygraph.DataHandlers.BarsHandler=function(){Dygraph.DataHandler.call(this)},Dygraph.DataHandlers.BarsHandler.prototype=new Dygraph.DataHandler;var t=Dygraph.DataHandlers.BarsHandler;t.prototype.extractSeries=function(t,e,a){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.onPointsCreated_=function(t,e){for(var a=0;a=l;l++)if(i=t[l][1],null!==i&&!isNaN(i)){var h=t[l][2][0],p=t[l][2][1];h>i&&(h=i),i>p&&(p=i),(null===n||p>n)&&(n=p),(null===r||r>h)&&(r=h)}return[r,n]},t.prototype.onLineEvaluated=function(t,e,a){for(var i,r=0;r=0){var g=t[l-e];null===g[1]||isNaN(g[1])||(r-=g[2][0],o-=g[1],n-=g[2][1],s-=1)}s?p[l]=[t[l][0],1*o/s,[1*r/s,1*n/s]]:p[l]=[t[l][0],null,[null,null]]}return p}}(),function(){"use strict";Dygraph.DataHandlers.ErrorBarsHandler=function(){};var t=Dygraph.DataHandlers.ErrorBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("logscale"),p=0;pr;r++)n=t[r][1],null===n||isNaN(n)||(l++,s+=n,p+=Math.pow(t[r][2][2],2));l?(h=Math.sqrt(p)/l,g=s/l,d[i]=[t[i][0],g,[g-u*h,g+u*h]]):(o=1==e?t[i][1]:null,d[i]=[t[i][0],o,[o,o]])}return d}}(),function(){"use strict";Dygraph.DataHandlers.FractionsBarsHandler=function(){};var t=Dygraph.DataHandlers.FractionsBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h,p,g=[],d=100,u=a.get("sigma"),c=a.get("logscale"),y=0;y=0&&(p-=t[n-e][2][2],g-=t[n-e][2][3]);var u=t[n][0],c=g?p/g:0;if(h)if(g){var y=0>c?0:c,_=g,v=l*Math.sqrt(y*(1-y)/_+l*l/(4*_*_)),f=1+l*l/g;i=(y+l*l/(2*g)-v)/f,r=(y+l*l/(2*g)+v)/f,s[n]=[u,y*d,[i*d,r*d]]}else s[n]=[u,0,[0,0]];else o=g?l*Math.sqrt(c*(1-c)/g):1,s[n]=[u,d*c,[d*(c-o),d*(c+o)]]}return s}}(); +//# sourceMappingURL=dygraph-combined.js.map \ No newline at end of file diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css new file mode 100644 index 000000000..4745b2fc2 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css @@ -0,0 +1,8 @@ + +div .dygraphs input[type="text"] { + width: 25px; +} + +div .qt .dygraph-axis-label { + font-size: 11px; +} \ No newline at end of file diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js new file mode 100644 index 000000000..2df07a9b8 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js @@ -0,0 +1,123 @@ +/** + * @license + * Copyright 2011 Dan Vanderkam (danvdk@gmail.com) + * MIT-licensed (http://opensource.org/licenses/MIT) + */ + +/** + * @fileoverview + * Including this file will add several additional shapes to Dygraph.Circles + * which can be passed to drawPointCallback. + * See tests/custom-circles.html for usage. + */ + +(function() { + +/** + * @param {!CanvasRenderingContext2D} ctx the canvas context + * @param {number} sides the number of sides in the shape. + * @param {number} radius the radius of the image. + * @param {number} cx center x coordate + * @param {number} cy center y coordinate + * @param {number=} rotationRadians the shift of the initial angle, in radians. + * @param {number=} delta the angle shift for each line. If missing, creates a + * regular polygon. + */ +var regularShape = function( + ctx, sides, radius, cx, cy, rotationRadians, delta) { + rotationRadians = rotationRadians || 0; + delta = delta || Math.PI * 2 / sides; + + ctx.beginPath(); + var initialAngle = rotationRadians; + var angle = initialAngle; + + var computeCoordinates = function() { + var x = cx + (Math.sin(angle) * radius); + var y = cy + (-Math.cos(angle) * radius); + return [x, y]; + }; + + var initialCoordinates = computeCoordinates(); + var x = initialCoordinates[0]; + var y = initialCoordinates[1]; + ctx.moveTo(x, y); + + for (var idx = 0; idx < sides; idx++) { + angle = (idx == sides - 1) ? initialAngle : (angle + delta); + var coords = computeCoordinates(); + ctx.lineTo(coords[0], coords[1]); + } + ctx.fill(); + ctx.stroke(); +}; + +/** + * TODO(danvk): be more specific on the return type. + * @param {number} sides + * @param {number=} rotationRadians + * @param {number=} delta + * @return {Function} + * @private + */ +var shapeFunction = function(sides, rotationRadians, delta) { + return function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + ctx.fillStyle = "white"; + regularShape(ctx, sides, radius, cx, cy, rotationRadians, delta); + }; +}; + +var customCircles = { + TRIANGLE : shapeFunction(3), + SQUARE : shapeFunction(4, Math.PI / 4), + DIAMOND : shapeFunction(4), + PENTAGON : shapeFunction(5), + HEXAGON : shapeFunction(6), + CIRCLE : function(g, name, ctx, cx, cy, color, radius) { + ctx.beginPath(); + ctx.strokeStyle = color; + ctx.fillStyle = "white"; + ctx.arc(cx, cy, radius, 0, 2 * Math.PI, false); + ctx.fill(); + ctx.stroke(); + }, + STAR : shapeFunction(5, 0, 4 * Math.PI / 5), + PLUS : function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy); + ctx.lineTo(cx - radius, cy); + ctx.closePath(); + ctx.stroke(); + + ctx.beginPath(); + ctx.moveTo(cx, cy + radius); + ctx.lineTo(cx, cy - radius); + ctx.closePath(); + ctx.stroke(); + }, + EX : function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy + radius); + ctx.lineTo(cx - radius, cy - radius); + ctx.closePath(); + ctx.stroke(); + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy - radius); + ctx.lineTo(cx - radius, cy + radius); + ctx.closePath(); + ctx.stroke(); + } +}; + +for (var k in customCircles) { + if (!customCircles.hasOwnProperty(k)) continue; + Dygraph.Circles[k] = customCircles[k]; +} + +})(); diff --git a/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js b/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js new file mode 100644 index 000000000..3cd03913f --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js @@ -0,0 +1,789 @@ + +// polyfill indexOf for IE8 +if (!Array.prototype.indexOf) { + Array.prototype.indexOf = function(elt /*, from*/) { + var len = this.length >>> 0; + + var from = Number(arguments[1]) || 0; + from = (from < 0) + ? Math.ceil(from) + : Math.floor(from); + if (from < 0) + from += len; + + for (; from < len; from++) { + if (from in this && + this[from] === elt) + return from; + } + return -1; + }; +} + +HTMLWidgets.widget({ + + name: "dygraphs", + + type: "output", + + factory: function(el, width, height) { + + // reference to dygraph + var dygraph = null; + + // reference to widget global groups + var groups = this.groups; + + // add qt style if we are running under Qt + if (window.navigator.userAgent.indexOf(" Qt/") > 0) + el.className += " qt"; + + return { + + renderValue: function(x) { + + // reference to this for closures + var thiz = this; + + // get dygraph attrs and populate file field + var attrs = x.attrs; + attrs.file = x.data; + + // disable zoom interaction except for clicks + if (attrs.disableZoom) { + attrs.interactionModel = Dygraph.Interaction.nonInteractiveModel_; + } + + // convert non-arrays to arrays + for (var index = 0; index < attrs.file.length; index++) { + if (!$.isArray(attrs.file[index])) + attrs.file[index] = [].concat(attrs.file[index]); + } + + // resolve "auto" legend behavior + if (x.attrs.legend == "auto") { + if (x.data.length <= 2) + x.attrs.legend = "onmouseover"; + else + x.attrs.legend = "always"; + } + + if (x.format == "date") { + + // set appropriated function in case of fixed tz + if ((attrs.axes.x.axisLabelFormatter === undefined) && x.fixedtz) + attrs.axes.x.axisLabelFormatter = this.xAxisLabelFormatterFixedTZ(x.tzone); + + if ((attrs.axes.x.valueFormatter === undefined) && x.fixedtz) + attrs.axes.x.valueFormatter = this.xValueFormatterFixedTZ(x.scale, x.tzone); + + if ((attrs.axes.x.ticker === undefined) && x.fixedtz) + attrs.axes.x.ticker = this.customDateTickerFixedTZ(x.tzone); + + // provide an automatic x value formatter if none is already specified + if ((attrs.axes.x.valueFormatter === undefined) && (x.fixedtz != true)) + attrs.axes.x.valueFormatter = this.xValueFormatter(x.scale); + + // convert time to js time + attrs.file[0] = attrs.file[0].map(function(value) { + return thiz.normalizeDateValue(x.scale, value, x.fixedtz); + }); + if (attrs.dateWindow != null) { + attrs.dateWindow = attrs.dateWindow.map(function(value) { + var date = thiz.normalizeDateValue(x.scale, value, x.fixedtz); + return date.getTime(); + }); + } + } + + + // transpose array + attrs.file = HTMLWidgets.transposeArray2D(attrs.file); + + // add drawCallback for group + if (x.group != null) + this.addGroupDrawCallback(x); + + // add shading and event callback if necessary + this.addShadingCallback(x); + this.addEventCallback(x); + this.addZoomCallback(x); + + // disable y-axis touch events on mobile phones + if (attrs.mobileDisableYTouch !== false && this.isMobilePhone()) { + // create default interaction model if necessary + if (!attrs.interactionModel) + attrs.interactionModel = Dygraph.Interaction.defaultModel; + // disable y touch direction + attrs.interactionModel.touchstart = function(event, dygraph, context) { + Dygraph.defaultInteractionModel.touchstart(event, dygraph, context); + context.touchDirections = { x: true, y: false }; + }; + } + + // create plugins + if (x.plugins) { + attrs.plugins = []; + for (var plugin in x.plugins) { + if (x.plugins.hasOwnProperty(plugin)) { + + // get plugin options + var options = x.plugins[plugin]; + + // create plugin and add to dygraph + var p = new Dygraph.Plugins[plugin](options); + attrs.plugins.push(p); + } + } + } + + // custom plotter + if (x.plotter) { + attrs.plotter = Dygraph.Plotters[x.plotter]; + } + + // custom data handler + if (x.dataHandler) { + attrs.dataHandler = Dygraph.DataHandlers[x.dataHandler]; + } + + // custom circles + if (x.pointShape) { + if (typeof x.pointShape === 'string') { + attrs.drawPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()]; + attrs.drawHighlightPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()]; + } else { + for (var s in x.pointShape) { + if (x.pointShape.hasOwnProperty(s)) { + attrs.series[s].drawPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()]; + attrs.series[s].drawHighlightPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()]; + } + } + } + } + + // if there is no existing dygraph perform initialization + if (!dygraph) { + + // subscribe to custom shown event (fired by ioslides to trigger + // shiny reactivity but we can use it as well). this is necessary + // because if a dygraph starts out as display:none it has height + // and width == 0 and this doesn't change when it becomes visible + $(el).closest('slide').on('shown', function() { + if (dygraph) + dygraph.resize(); + }); + + // do the same for reveal.js + $(el).closest('section.slide').on('shown', function() { + if (dygraph) + dygraph.resize(); + }); + + // redraw on R Markdown {.tabset} tab visibility changed + var tab = $(el).closest('div.tab-pane'); + if (tab !== null) { + var tabID = tab.attr('id'); + var tabAnchor = $('a[data-toggle="tab"][href="#' + tabID + '"]'); + if (tabAnchor !== null) { + tabAnchor.on('shown.bs.tab', function() { + if (dygraph) + dygraph.resize(); + }); + } + } + // add default font for viewer mode + if (this.queryVar("viewer_pane") === "1") + document.body.style.fontFamily = "Arial, sans-serif"; + + // inject css if necessary + if (x.css != null) { + var style = document.createElement('style'); + style.type = 'text/css'; + if (style.styleSheet) + style.styleSheet.cssText = x.css; + else + style.appendChild(document.createTextNode(x.css)); + document.getElementsByTagName("head")[0].appendChild(style); + } + + } else { + + // retain the userDateWindow if requested + if (dygraph.userDateWindow != null + && attrs.retainDateWindow == true) { + attrs.dateWindow = dygraph.xAxisRange(); + } + + // remove it from groups if it's there + if (x.group != null && groups[x.group] != null) { + var index = groups[x.group].indexOf(dygraph); + if (index != -1) + groups[x.group].splice(index, 1); + } + + // destroy the existing dygraph + dygraph.destroy(); + dygraph = null; + } + + // create the dygraph and add it to it's group (if any) + dygraph = thiz.dygraph = new Dygraph(el, attrs.file, attrs); + dygraph.userDateWindow = attrs.dateWindow; + if (x.group != null) + groups[x.group].push(dygraph); + + // add shiny inputs for date window and click + if (HTMLWidgets.shinyMode) { + var isDate = x.format == "date"; + this.addClickShinyInput(el.id, isDate); + this.addDateWindowShinyInput(el.id, isDate); + } + + // set annotations + if (x.annotations != null) { + dygraph.ready(function() { + if (x.format == "date") { + x.annotations.map(function(annotation) { + var date = thiz.normalizeDateValue(x.scale, annotation.x, x.fixedtz); + annotation.x = date.getTime(); + }); + } + dygraph.setAnnotations(x.annotations); + }); + } + + }, + + customDateTickerFixedTZ : function(tz){ + return function(t,e,a,i,r) { + var a=Dygraph.pickDateTickGranularity(t,e,a,i); + if(a >= 0){ + + var n=i("axisLabelFormatter"), + o=i("labelsUTC"), + s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal; + l=Dygraph.TICK_PLACEMENT[a].datefield; + h=Dygraph.TICK_PLACEMENT[a].step; + p=Dygraph.TICK_PLACEMENT[a].spacing; + + var y = []; + var d = moment(t); + d.tz(tz); + d.millisecond(0); + + if(l > Dygraph.DATEFIELD_M){ + var x; + if (l === Dygraph.DATEFIELD_SS) { // seconds + x = d.second(); + d.second(x - x % h); + } else if(l === Dygraph.DATEFIELD_MM){ + d.second(0) + x = d.minute(); + d.minute(x - x % h); + } else if(l === Dygraph.DATEFIELD_HH){ + d.second(0); + d.minute(0); + x = d.hour(); + d.hour(x - x % h); + } else if(l === Dygraph.DATEFIELD_D){ + d.second(0); + d.minute(0); + d.hour(0); + if (h == 7) { // one week + d.startOf('week'); + } + } + + v = d.valueOf(); + _=moment(v).tz(tz); + + // For spacings coarser than two-hourly, we want to ignore daylight + // savings transitions to get consistent ticks. For finer-grained ticks, + // it's essential to show the DST transition in all its messiness. + var start_offset_min = moment(v).tz(tz).zone(); + var check_dst = (p >= Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY].spacing); + + if(a<=Dygraph.HOURLY){ + for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){ + y.push({v:v,label:n(_,a,i,r)}); + v+=p; + _=moment(v).tz(tz); + } + }else{ + for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){ + + // This ensures that we stay on the same hourly "rhythm" across + // daylight savings transitions. Without this, the ticks could get off + // by an hour. See tests/daylight-savings.html or issue 147. + if (check_dst && _.zone() != start_offset_min) { + var delta_min = _.zone() - start_offset_min; + v += delta_min * 60 * 1000; + _= moment(v).tz(tz); + start_offset_min = _.zone(); + + // Check whether we've backed into the previous timezone again. + // This can happen during a "spring forward" transition. In this case, + // it's best to skip this tick altogether (we may be shooting for a + // non-existent time like the 2AM that's skipped) and go to the next + // one. + if (moment(v + p).tz(tz).zone() != start_offset_min) { + v += p; + _= moment(v).tz(tz); + start_offset_min = _.zone(); + } + } + + (a>=Dygraph.DAILY||_.get('hour')%h===0)&&y.push({v:v,label:n(_,a,i,r)}); + v+=p; + _=moment(v).tz(tz); + } + } + }else{ + var start_year = moment(t).tz(tz).year(); + var end_year = moment(e).tz(tz).year(); + var start_month = moment(t).tz(tz).month(); + + if(l === Dygraph.DATEFIELD_M){ + var step_month = h; + for (var ii = start_year; ii <= end_year; ii++) { + for (var j = 0; j < 12;) { + var dt = moment(new Date(ii, j, 1)).tz(tz); + // fix some tz bug + dt.year(ii); + dt.month(j); + dt.date(1); + dt.hour(0); + v = dt.valueOf(); + y.push({v:v,label:n(moment(v).tz(tz),a,i,r)}); + j+=step_month; + } + } + }else{ + var step_year = h; + for (var ii = start_year; ii <= end_year;) { + var dt = moment(new Date(ii, 1, 1)).tz(tz); + // fix some tz bug + dt.year(ii); + dt.month(j); + dt.date(1); + dt.hour(0); + v = dt.valueOf(); + y.push({v:v,label:n(moment(v).tz(tz),a,i,r)}); + ii+=step_year; + } + } + } + return y; + }else{ + return []; + } + }; + }, + + xAxisLabelFormatterFixedTZ : function(tz){ + + return function dateAxisFormatter(date, granularity){ + var mmnt = moment(date).tz(tz); + if (granularity >= Dygraph.DECADAL){ + return mmnt.format('YYYY'); + }else{ + if(granularity >= Dygraph.MONTHLY){ + return mmnt.format('MMM YYYY'); + }else{ + var frac = mmnt.hour() * 3600 + mmnt.minute() * 60 + mmnt.second() + mmnt.millisecond(); + if (frac === 0 || granularity >= Dygraph.DAILY) { + return mmnt.format('DD MMM'); + } else { + if (mmnt.second()) { + return mmnt.format('HH:mm:ss'); + } else { + return mmnt.format('HH:mm'); + } + } + } + + } + } + }, + + xValueFormatterFixedTZ: function(scale, tz) { + + return function(millis) { + var mmnt = moment(millis).tz(tz); + if (scale == "yearly") + return mmnt.format('YYYY') + ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "quarterly") + return mmnt.fquarter(1) + ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "monthly") + return mmnt.format('MMM, YYYY')+ ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "daily" || scale == "weekly") + return mmnt.format('MMM, DD, YYYY')+ ' (' + mmnt.zoneAbbr() + ')'; + else + return mmnt.format('dddd, MMMM DD, YYYY HH:mm:ss')+ ' (' + mmnt.zoneAbbr() + ')'; + } + }, + + xValueFormatter: function(scale) { + + var monthNames = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", + "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]; + + return function(millis) { + var date = new Date(millis); + if (scale == "yearly") + return date.getFullYear(); + else if (scale == "quarterly") + return moment(millis).fquarter(1); + else if (scale == "monthly") + return monthNames[date.getMonth()] + ', ' + date.getFullYear(); + else if (scale == "daily" || scale == "weekly") + return monthNames[date.getMonth()] + ', ' + + date.getDate() + ', ' + + date.getFullYear(); + else + return date.toLocaleString(); + } + }, + + addZoomCallback: function(x) { + + // alias this + var thiz = this; + + // get attrs + var attrs = x.attrs; + + // check for an existing zoomCallback + var prevZoomCallback = attrs["zoomCallback"]; + + attrs.zoomCallback = function(minDate, maxDate, yRanges) { + + // call existing + if (prevZoomCallback) + prevZoomCallback(minDate, maxDate, yRanges); + + // record user date window (or lack thereof) + if (dygraph.xAxisExtremes()[0] != minDate || + dygraph.xAxisExtremes()[1] != maxDate) { + dygraph.userDateWindow = [minDate, maxDate]; + } else { + dygraph.userDateWindow = null; + } + + // record in group if necessary + if (x.group != null && groups[x.group] != null) { + var group = groups[x.group]; + for(var i = 0; i=0.1){ + var dashLength = dashArray[dashIndex++%dashCount]; + if (dashLength > distRemaining) dashLength = distRemaining; + var xStep = Math.sqrt( dashLength*dashLength / (1 + slope*slope) ); + if (dx<0) xStep = -xStep; + x += xStep + y += slope*xStep; + canvas[draw ? 'lineTo' : 'moveTo'](x,y); + distRemaining -= dashLength; + draw = !draw; + } + canvas.stroke(); + }, + + setFontSize: function(canvas, size) { + var cFont = canvas.font; + var parts = cFont.split(' '); + if (parts.length === 2) + canvas.font = size + 'px ' + parts[1]; + else if (parts.length === 3) + canvas.font = parts[0] + ' ' + size + 'px ' + parts[2]; + }, + + // Returns the value of a GET variable + queryVar: function(name) { + return decodeURI(window.location.search.replace( + new RegExp("^(?:.*[&\\?]" + + encodeURI(name).replace(/[\.\+\*]/g, "\\$&") + + "(?:\\=([^&]*))?)?.*$", "i"), + "$1")); + }, + + // We deal exclusively in UTC dates within R, however dygraphs deals + // exclusively in the local time zone. Therefore, in order to plot date + // labels that make sense to the user when we are dealing with days, + // months or years we need to convert the UTC date value to a local time + // value that "looks like" the equivilant UTC value. To do this we add the + // timezone offset to the UTC date. + // Don't use in case of fixedtz + normalizeDateValue: function(scale, value, fixedtz) { + var date = new Date(value); + if (scale != "minute" && scale != "hourly" && scale != "seconds" && !fixedtz) { + var localAsUTC = date.getTime() + (date.getTimezoneOffset() * 60000); + date = new Date(localAsUTC); + } + return date; + }, + + // safely detect rendering on a mobile phone + isMobilePhone: function() { + try + { + return ! window.matchMedia("only screen and (min-width: 768px)").matches; + } + catch(e) { + return false; + } + }, + + + resize: function(width, height) { + if (dygraph) + dygraph.resize(); + }, + + // export dygraph so other code can get a hold of it + dygraph: null + + }; + }, + + // track groups globally + groups: {} + +}); + diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf new file mode 100644 index 000000000..35acda2fa Binary files /dev/null and b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf differ diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css new file mode 100644 index 000000000..8e5bb8a3c --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css @@ -0,0 +1,99 @@ +.book .book-header h1 { + padding-left: 20px; + padding-right: 20px; +} +.book .book-header.fixed { + position: fixed; + right: 0; + top: 0; + left: 0; + border-bottom: 1px solid rgba(0,0,0,.07); +} +span.search-highlight { + background-color: #ffff88; +} +@media (min-width: 600px) { + .book.with-summary .book-header.fixed { + left: 300px; + } +} +@media (max-width: 1240px) { + .book .book-body.fixed { + top: 50px; + } + .book .book-body.fixed .body-inner { + top: auto; + } +} +@media (max-width: 600px) { + .book.with-summary .book-header.fixed { + left: calc(100% - 60px); + min-width: 300px; + } + .book.with-summary .book-body { + transform: none; + left: calc(100% - 60px); + min-width: 300px; + } + .book .book-body.fixed { + top: 0; + } +} + +.book .book-body.fixed .body-inner { + top: 50px; +} +.book .book-body .page-wrapper .page-inner section.normal sub, .book .book-body .page-wrapper .page-inner section.normal sup { + font-size: 85%; +} + +@media print { + .book .book-summary, .book .book-body .book-header, .fa { + display: none !important; + } + .book .book-body.fixed { + left: 0px; + } + .book .book-body,.book .book-body .body-inner, .book.with-summary { + overflow: visible !important; + } +} +.kable_wrapper { + border-spacing: 20px 0; + border-collapse: separate; + border: none; + margin: auto; +} +.kable_wrapper > tbody > tr > td { + vertical-align: top; +} +.book .book-body .page-wrapper .page-inner section.normal table tr.header { + border-top-width: 2px; +} +.book .book-body .page-wrapper .page-inner section.normal table tr:last-child td { + border-bottom-width: 2px; +} +.book .book-body .page-wrapper .page-inner section.normal table td, .book .book-body .page-wrapper .page-inner section.normal table th { + border-left: none; + border-right: none; +} +.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr, .book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr > td { + border-top: none; +} +.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr:last-child > td { + border-bottom: none; +} + +div.theorem, div.lemma, div.corollary, div.proposition, div.conjecture { + font-style: italic; +} +span.theorem, span.lemma, span.corollary, span.proposition, span.conjecture { + font-style: normal; +} +div.proof:after { + content: "\25a2"; + float: right; +} +.header-section-number { + padding-right: .5em; +} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css new file mode 100644 index 000000000..87236b4c0 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css @@ -0,0 +1,292 @@ +/* + * Theme 1 + */ +.color-theme-1 .dropdown-menu { + background-color: #111111; + border-color: #7e888b; +} +.color-theme-1 .dropdown-menu .dropdown-caret .caret-inner { + border-bottom: 9px solid #111111; +} +.color-theme-1 .dropdown-menu .buttons { + border-color: #7e888b; +} +.color-theme-1 .dropdown-menu .button { + color: #afa790; +} +.color-theme-1 .dropdown-menu .button:hover { + color: #73553c; +} +/* + * Theme 2 + */ +.color-theme-2 .dropdown-menu { + background-color: #2d3143; + border-color: #272a3a; +} +.color-theme-2 .dropdown-menu .dropdown-caret .caret-inner { + border-bottom: 9px solid #2d3143; +} +.color-theme-2 .dropdown-menu .buttons { + border-color: #272a3a; +} +.color-theme-2 .dropdown-menu .button { + color: #62677f; +} +.color-theme-2 .dropdown-menu .button:hover { + color: #f4f4f5; +} +.book .book-header .font-settings .font-enlarge { + line-height: 30px; + font-size: 1.4em; +} +.book .book-header .font-settings .font-reduce { + line-height: 30px; + font-size: 1em; +} +.book.color-theme-1 .book-body { + color: #704214; + background: #f3eacb; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section { + background: #f3eacb; +} +.book.color-theme-2 .book-body { + color: #bdcadb; + background: #1c1f2b; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section { + background: #1c1f2b; +} +.book.font-size-0 .book-body .page-inner section { + font-size: 1.2rem; +} +.book.font-size-1 .book-body .page-inner section { + font-size: 1.4rem; +} +.book.font-size-2 .book-body .page-inner section { + font-size: 1.6rem; +} +.book.font-size-3 .book-body .page-inner section { + font-size: 2.2rem; +} +.book.font-size-4 .book-body .page-inner section { + font-size: 4rem; +} +.book.font-family-0 { + font-family: Georgia, serif; +} +.book.font-family-1 { + font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal { + color: #704214; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal a { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h3, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h4, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h5, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2 { + border-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal hr { + background-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal blockquote { + border-color: #c4b29f; + opacity: 0.9; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code { + background: #fdf6e3; + color: #657b83; + border-color: #f8df9c; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal .highlight { + background-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table th, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table td { + border-color: #f5d06c; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr { + color: inherit; + background-color: #fdf6e3; + border-color: #444444; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) { + background-color: #fbeecb; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal { + color: #bdcadb; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal a { + color: #3eb1d0; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h3, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h4, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h5, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 { + color: #fffffa; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2 { + border-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 { + color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal hr { + background-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal blockquote { + border-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code { + color: #9dbed8; + background: #2d3143; + border-color: #2d3143; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal .highlight { + background-color: #282a39; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table th, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table td { + border-color: #3b3f54; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr { + color: #b6c2d2; + background-color: #2d3143; + border-color: #3b3f54; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) { + background-color: #35394b; +} +.book.color-theme-1 .book-header { + color: #afa790; + background: transparent; +} +.book.color-theme-1 .book-header .btn { + color: #afa790; +} +.book.color-theme-1 .book-header .btn:hover { + color: #73553c; + background: none; +} +.book.color-theme-1 .book-header h1 { + color: #704214; +} +.book.color-theme-2 .book-header { + color: #7e888b; + background: transparent; +} +.book.color-theme-2 .book-header .btn { + color: #3b3f54; +} +.book.color-theme-2 .book-header .btn:hover { + color: #fffff5; + background: none; +} +.book.color-theme-2 .book-header h1 { + color: #bdcadb; +} +.book.color-theme-1 .book-body .navigation { + color: #afa790; +} +.book.color-theme-1 .book-body .navigation:hover { + color: #73553c; +} +.book.color-theme-2 .book-body .navigation { + color: #383f52; +} +.book.color-theme-2 .book-body .navigation:hover { + color: #fffff5; +} +/* + * Theme 1 + */ +.book.color-theme-1 .book-summary { + color: #afa790; + background: #111111; + border-right: 1px solid rgba(0, 0, 0, 0.07); +} +.book.color-theme-1 .book-summary .book-search { + background: transparent; +} +.book.color-theme-1 .book-summary .book-search input, +.book.color-theme-1 .book-summary .book-search input:focus { + border: 1px solid transparent; +} +.book.color-theme-1 .book-summary ul.summary li.divider { + background: #7e888b; + box-shadow: none; +} +.book.color-theme-1 .book-summary ul.summary li i.fa-check { + color: #33cc33; +} +.book.color-theme-1 .book-summary ul.summary li.done > a { + color: #877f6a; +} +.book.color-theme-1 .book-summary ul.summary li a, +.book.color-theme-1 .book-summary ul.summary li span { + color: #877f6a; + background: transparent; + font-weight: normal; +} +.book.color-theme-1 .book-summary ul.summary li.active > a, +.book.color-theme-1 .book-summary ul.summary li a:hover { + color: #704214; + background: transparent; + font-weight: normal; +} +/* + * Theme 2 + */ +.book.color-theme-2 .book-summary { + color: #bcc1d2; + background: #2d3143; + border-right: none; +} +.book.color-theme-2 .book-summary .book-search { + background: transparent; +} +.book.color-theme-2 .book-summary .book-search input, +.book.color-theme-2 .book-summary .book-search input:focus { + border: 1px solid transparent; +} +.book.color-theme-2 .book-summary ul.summary li.divider { + background: #272a3a; + box-shadow: none; +} +.book.color-theme-2 .book-summary ul.summary li i.fa-check { + color: #33cc33; +} +.book.color-theme-2 .book-summary ul.summary li.done > a { + color: #62687f; +} +.book.color-theme-2 .book-summary ul.summary li a, +.book.color-theme-2 .book-summary ul.summary li span { + color: #c1c6d7; + background: transparent; + font-weight: 600; +} +.book.color-theme-2 .book-summary ul.summary li.active > a, +.book.color-theme-2 .book-summary ul.summary li a:hover { + color: #f4f4f5; + background: #252737; + font-weight: 600; +} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css new file mode 100644 index 000000000..2aabd3deb --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css @@ -0,0 +1,426 @@ +.book .book-body .page-wrapper .page-inner section.normal pre, +.book .book-body .page-wrapper .page-inner section.normal code { + /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ + /* Tomorrow Comment */ + /* Tomorrow Red */ + /* Tomorrow Orange */ + /* Tomorrow Yellow */ + /* Tomorrow Green */ + /* Tomorrow Aqua */ + /* Tomorrow Blue */ + /* Tomorrow Purple */ +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-title { + color: #8e908c; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-tag, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-tag, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-id, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-class, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo { + color: #c82829; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-literal, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-literal, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-params, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-params, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-constant { + color: #f5871f; +} +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute { + color: #eab700; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-value, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-value, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-header, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + color: #718c00; +} +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor { + color: #3e999f; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator, +.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator, +.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub, +.book .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title { + color: #4271ae; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function { + color: #8959a8; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + background: white; + color: #4d4d4c; + padding: 0.5em; +} +.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript, +.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .xml, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .xml, +.book .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .javascript, +.book .book-body .page-wrapper .page-inner section.normal code .xml .javascript, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript, +.book .book-body .page-wrapper .page-inner section.normal code .xml .vbscript, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .css, +.book .book-body .page-wrapper .page-inner section.normal code .xml .css, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + opacity: 0.5; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code { + /* + +Orginal Style from ethanschoonover.com/solarized (c) Jeremy Hull + +*/ + /* Solarized Green */ + /* Solarized Cyan */ + /* Solarized Blue */ + /* Solarized Yellow */ + /* Solarized Orange */ + /* Solarized Red */ + /* Solarized Violet */ +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + padding: 0.5em; + background: #fdf6e3; + color: #657b83; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-template_comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-template_comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-doctype, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-doctype, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pi, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pi, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-javadoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-javadoc { + color: #93a1a1; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-winutils, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-winutils, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .method, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .method, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-addition, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-addition, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-tag, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-tag, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-request, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-request, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-status, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-status, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .nginx .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .nginx .hljs-title { + color: #859900; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-command, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-command, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-tag .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-rules .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-rules .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-phpdoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-phpdoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-hexcolor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-hexcolor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_url, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_url { + color: #2aa198; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-localvars, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-localvars, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-chunk, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-chunk, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-decorator, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-decorator, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-identifier, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-identifier, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .vhdl .hljs-literal, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .vhdl .hljs-literal, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-id, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-id, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-function, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-function { + color: #268bd2; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-body, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-body, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .smalltalk .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .smalltalk .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-constant, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-class .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-class .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-parent, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-parent, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .haskell .hljs-type, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .haskell .hljs-type, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_reference, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_reference { + color: #b58900; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-shebang, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-shebang, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-change, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-change, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-special, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-special, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attr_selector, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attr_selector, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-subst, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-subst, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-cdata, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-cdata, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .clojure .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .clojure .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-header { + color: #cb4b16; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-deletion, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-deletion, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-important, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-important { + color: #dc322f; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_label, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_label { + color: #6c71c4; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula { + background: #eee8d5; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code { + /* Tomorrow Night Bright Theme */ + /* Original theme - https://github.com/chriskempson/tomorrow-theme */ + /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ + /* Tomorrow Comment */ + /* Tomorrow Red */ + /* Tomorrow Orange */ + /* Tomorrow Yellow */ + /* Tomorrow Green */ + /* Tomorrow Aqua */ + /* Tomorrow Blue */ + /* Tomorrow Purple */ +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-title { + color: #969896; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-tag, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-id, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-class, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo { + color: #d54e53; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-literal, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-literal, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-params, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-params, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-constant { + color: #e78c45; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute { + color: #e7c547; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-value, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-value, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-header, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + color: #b9ca4a; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor { + color: #70c0b1; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title { + color: #7aa6da; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function { + color: #c397d8; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + background: black; + color: #eaeaea; + padding: 0.5em; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .xml, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .xml, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .vbscript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .css, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .css, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + opacity: 0.5; +} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css new file mode 100644 index 000000000..d7ff2d991 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css @@ -0,0 +1,28 @@ +.book .book-summary .book-search { + padding: 6px; + background: transparent; + position: absolute; + top: -50px; + left: 0px; + right: 0px; + transition: top 0.5s ease; +} +.book .book-summary .book-search input, +.book .book-summary .book-search input:focus, +.book .book-summary .book-search input:hover { + width: 100%; + background: transparent; + border: 1px solid #ccc; + box-shadow: none; + outline: none; + line-height: 22px; + padding: 7px 4px; + color: inherit; + box-sizing: border-box; +} +.book.with-search .book-summary .book-search { + top: 0px; +} +.book.with-search .book-summary ul.summary { + top: 50px; +} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css new file mode 100644 index 000000000..7fba1b9fb --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css @@ -0,0 +1 @@ +.book .book-body .page-wrapper .page-inner section.normal table{display:table;width:100%;border-collapse:collapse;border-spacing:0;overflow:auto}.book .book-body .page-wrapper .page-inner section.normal table td,.book .book-body .page-wrapper .page-inner section.normal table th{padding:6px 13px;border:1px solid #ddd}.book .book-body .page-wrapper .page-inner section.normal table tr{background-color:#fff;border-top:1px solid #ccc}.book .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n){background-color:#f8f8f8}.book .book-body .page-wrapper .page-inner section.normal table th{font-weight:700} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css new file mode 100644 index 000000000..b89689209 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css @@ -0,0 +1,10 @@ +/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*,.fa{-webkit-font-smoothing:antialiased}.fa-ul>li,sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*! + * Preboot v2 + * + * Open sourced under MIT license by @mdo. + * Some variables and mixins from Bootstrap (Apache 2 license). + */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}.fa,.fa-stack{display:inline-block}/*! + * Font Awesome 4.1.0 by @davegandy - http://fontawesome.io - @fontawesome + * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) + */@font-face{font-family:FontAwesome;src:url(./fontawesome/fontawesome-webfont.ttf?v=4.1.0) format('truetype');font-weight:400;font-style:normal}.fa{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1;-moz-osx-font-smoothing:grayscale}.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:spin 2s infinite linear;-moz-animation:spin 2s infinite linear;-o-animation:spin 2s infinite linear;animation:spin 2s infinite linear}@-moz-keyframes spin{0%{-moz-transform:rotate(0)}100%{-moz-transform:rotate(359deg)}}@-webkit-keyframes spin{0%{-webkit-transform:rotate(0)}100%{-webkit-transform:rotate(359deg)}}@-o-keyframes spin{0%{-o-transform:rotate(0)}100%{-o-transform:rotate(359deg)}}@keyframes spin{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=1);-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=3);-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1);-webkit-transform:scale(-1,1);-moz-transform:scale(-1,1);-ms-transform:scale(-1,1);-o-transform:scale(-1,1);transform:scale(-1,1)}.fa-flip-vertical{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1);-webkit-transform:scale(1,-1);-moz-transform:scale(1,-1);-ms-transform:scale(1,-1);-o-transform:scale(1,-1);transform:scale(1,-1)}.fa-stack{position:relative;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-cog:before,.fa-gear:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-repeat:before,.fa-rotate-right:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-exclamation-triangle:before,.fa-warning:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-cogs:before,.fa-gears:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-floppy-o:before,.fa-save:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-sort:before,.fa-unsorted:before{content:"\f0dc"}.fa-sort-desc:before,.fa-sort-down:before{content:"\f0dd"}.fa-sort-asc:before,.fa-sort-up:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-gavel:before,.fa-legal:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-bolt:before,.fa-flash:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-clipboard:before,.fa-paste:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-chain-broken:before,.fa-unlink:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:"\f150"}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:"\f151"}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:"\f152"}.fa-eur:before,.fa-euro:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-inr:before,.fa-rupee:before{content:"\f156"}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:"\f157"}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:"\f158"}.fa-krw:before,.fa-won:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-try:before,.fa-turkish-lira:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-bank:before,.fa-institution:before,.fa-university:before{content:"\f19c"}.fa-graduation-cap:before,.fa-mortar-board:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-square:before,.fa-pied-piper:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:"\f1c5"}.fa-file-archive-o:before,.fa-file-zip-o:before{content:"\f1c6"}.fa-file-audio-o:before,.fa-file-sound-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-rebel:before{content:"\f1d0"}.fa-empire:before,.fa-ge:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-paper-plane:before,.fa-send:before{content:"\f1d8"}.fa-paper-plane-o:before,.fa-send-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:touch;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%} +.book .book-summary ul.summary li a span {display:inline;padding:initial;overflow:visible;cursor:auto;opacity:1;} diff --git a/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js new file mode 100644 index 000000000..9ace197e9 --- /dev/null +++ b/docs/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js @@ -0,0 +1,6 @@ +(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o"'`]/g,reHasEscapedHtml=RegExp(reEscapedHtml.source),reHasUnescapedHtml=RegExp(reUnescapedHtml.source);var reEscape=/<%-([\s\S]+?)%>/g,reEvaluate=/<%([\s\S]+?)%>/g,reInterpolate=/<%=([\s\S]+?)%>/g;var reIsDeepProp=/\.|\[(?:[^[\]]*|(["'])(?:(?!\1)[^\n\\]|\\.)*?\1)\]/,reIsPlainProp=/^\w*$/,rePropName=/[^.[\]]+|\[(?:(-?\d+(?:\.\d+)?)|(["'])((?:(?!\2)[^\n\\]|\\.)*?)\2)\]/g;var reRegExpChars=/^[:!,]|[\\^$.*+?()[\]{}|\/]|(^[0-9a-fA-Fnrtuvx])|([\n\r\u2028\u2029])/g,reHasRegExpChars=RegExp(reRegExpChars.source);var reComboMark=/[\u0300-\u036f\ufe20-\ufe23]/g;var reEscapeChar=/\\(\\)?/g;var reEsTemplate=/\$\{([^\\}]*(?:\\.[^\\}]*)*)\}/g;var reFlags=/\w*$/;var reHasHexPrefix=/^0[xX]/;var reIsHostCtor=/^\[object .+?Constructor\]$/;var reIsUint=/^\d+$/;var reLatin1=/[\xc0-\xd6\xd8-\xde\xdf-\xf6\xf8-\xff]/g;var reNoMatch=/($^)/;var reUnescapedString=/['\n\r\u2028\u2029\\]/g;var reWords=function(){var upper="[A-Z\\xc0-\\xd6\\xd8-\\xde]",lower="[a-z\\xdf-\\xf6\\xf8-\\xff]+";return RegExp(upper+"+(?="+upper+lower+")|"+upper+"?"+lower+"|"+upper+"+|[0-9]+","g")}();var contextProps=["Array","ArrayBuffer","Date","Error","Float32Array","Float64Array","Function","Int8Array","Int16Array","Int32Array","Math","Number","Object","RegExp","Set","String","_","clearTimeout","isFinite","parseFloat","parseInt","setTimeout","TypeError","Uint8Array","Uint8ClampedArray","Uint16Array","Uint32Array","WeakMap"];var templateCounter=-1;var typedArrayTags={};typedArrayTags[float32Tag]=typedArrayTags[float64Tag]=typedArrayTags[int8Tag]=typedArrayTags[int16Tag]=typedArrayTags[int32Tag]=typedArrayTags[uint8Tag]=typedArrayTags[uint8ClampedTag]=typedArrayTags[uint16Tag]=typedArrayTags[uint32Tag]=true;typedArrayTags[argsTag]=typedArrayTags[arrayTag]=typedArrayTags[arrayBufferTag]=typedArrayTags[boolTag]=typedArrayTags[dateTag]=typedArrayTags[errorTag]=typedArrayTags[funcTag]=typedArrayTags[mapTag]=typedArrayTags[numberTag]=typedArrayTags[objectTag]=typedArrayTags[regexpTag]=typedArrayTags[setTag]=typedArrayTags[stringTag]=typedArrayTags[weakMapTag]=false;var cloneableTags={};cloneableTags[argsTag]=cloneableTags[arrayTag]=cloneableTags[arrayBufferTag]=cloneableTags[boolTag]=cloneableTags[dateTag]=cloneableTags[float32Tag]=cloneableTags[float64Tag]=cloneableTags[int8Tag]=cloneableTags[int16Tag]=cloneableTags[int32Tag]=cloneableTags[numberTag]=cloneableTags[objectTag]=cloneableTags[regexpTag]=cloneableTags[stringTag]=cloneableTags[uint8Tag]=cloneableTags[uint8ClampedTag]=cloneableTags[uint16Tag]=cloneableTags[uint32Tag]=true;cloneableTags[errorTag]=cloneableTags[funcTag]=cloneableTags[mapTag]=cloneableTags[setTag]=cloneableTags[weakMapTag]=false;var deburredLetters={"À":"A","Á":"A","Â":"A","Ã":"A","Ä":"A","Å":"A","à":"a","á":"a","â":"a","ã":"a","ä":"a","å":"a","Ç":"C","ç":"c","Ð":"D","ð":"d","È":"E","É":"E","Ê":"E","Ë":"E","è":"e","é":"e","ê":"e","ë":"e","Ì":"I","Í":"I","Î":"I","Ï":"I","ì":"i","í":"i","î":"i","ï":"i","Ñ":"N","ñ":"n","Ò":"O","Ó":"O","Ô":"O","Õ":"O","Ö":"O","Ø":"O","ò":"o","ó":"o","ô":"o","õ":"o","ö":"o","ø":"o","Ù":"U","Ú":"U","Û":"U","Ü":"U","ù":"u","ú":"u","û":"u","ü":"u","Ý":"Y","ý":"y","ÿ":"y","Æ":"Ae","æ":"ae","Þ":"Th","þ":"th","ß":"ss"};var htmlEscapes={"&":"&","<":"<",">":">",'"':""","'":"'","`":"`"};var htmlUnescapes={"&":"&","<":"<",">":">",""":'"',"'":"'","`":"`"};var objectTypes={"function":true,object:true};var regexpEscapes={0:"x30",1:"x31",2:"x32",3:"x33",4:"x34",5:"x35",6:"x36",7:"x37",8:"x38",9:"x39",A:"x41",B:"x42",C:"x43",D:"x44",E:"x45",F:"x46",a:"x61",b:"x62",c:"x63",d:"x64",e:"x65",f:"x66",n:"x6e",r:"x72",t:"x74",u:"x75",v:"x76",x:"x78"};var stringEscapes={"\\":"\\","'":"'","\n":"n","\r":"r","\u2028":"u2028","\u2029":"u2029"};var freeExports=objectTypes[typeof exports]&&exports&&!exports.nodeType&&exports;var freeModule=objectTypes[typeof module]&&module&&!module.nodeType&&module;var freeGlobal=freeExports&&freeModule&&typeof global=="object"&&global&&global.Object&&global;var freeSelf=objectTypes[typeof self]&&self&&self.Object&&self;var freeWindow=objectTypes[typeof window]&&window&&window.Object&&window;var moduleExports=freeModule&&freeModule.exports===freeExports&&freeExports;var root=freeGlobal||freeWindow!==(this&&this.window)&&freeWindow||freeSelf||this;function baseCompareAscending(value,other){if(value!==other){var valIsNull=value===null,valIsUndef=value===undefined,valIsReflexive=value===value;var othIsNull=other===null,othIsUndef=other===undefined,othIsReflexive=other===other;if(value>other&&!othIsNull||!valIsReflexive||valIsNull&&!othIsUndef&&othIsReflexive||valIsUndef&&othIsReflexive){return 1}if(value-1){}return index}function charsRightIndex(string,chars){var index=string.length;while(index--&&chars.indexOf(string.charAt(index))>-1){}return index}function compareAscending(object,other){return baseCompareAscending(object.criteria,other.criteria)||object.index-other.index}function compareMultiple(object,other,orders){var index=-1,objCriteria=object.criteria,othCriteria=other.criteria,length=objCriteria.length,ordersLength=orders.length;while(++index=ordersLength){return result}var order=orders[index];return result*(order==="asc"||order===true?1:-1)}}return object.index-other.index}function deburrLetter(letter){return deburredLetters[letter]}function escapeHtmlChar(chr){return htmlEscapes[chr]}function escapeRegExpChar(chr,leadingChar,whitespaceChar){if(leadingChar){chr=regexpEscapes[chr]}else if(whitespaceChar){chr=stringEscapes[chr]}return"\\"+chr}function escapeStringChar(chr){return"\\"+stringEscapes[chr]}function indexOfNaN(array,fromIndex,fromRight){var length=array.length,index=fromIndex+(fromRight?0:-1);while(fromRight?index--:++index=9&&charCode<=13)||charCode==32||charCode==160||charCode==5760||charCode==6158||charCode>=8192&&(charCode<=8202||charCode==8232||charCode==8233||charCode==8239||charCode==8287||charCode==12288||charCode==65279)}function replaceHolders(array,placeholder){var index=-1,length=array.length,resIndex=-1,result=[];while(++index>>1;var MAX_SAFE_INTEGER=9007199254740991;var metaMap=WeakMap&&new WeakMap;var realNames={};function lodash(value){if(isObjectLike(value)&&!isArray(value)&&!(value instanceof LazyWrapper)){if(value instanceof LodashWrapper){return value}if(hasOwnProperty.call(value,"__chain__")&&hasOwnProperty.call(value,"__wrapped__")){return wrapperClone(value)}}return new LodashWrapper(value)}function baseLodash(){}function LodashWrapper(value,chainAll,actions){this.__wrapped__=value;this.__actions__=actions||[];this.__chain__=!!chainAll}var support=lodash.support={};lodash.templateSettings={escape:reEscape,evaluate:reEvaluate,interpolate:reInterpolate,variable:"",imports:{_:lodash}};function LazyWrapper(value){this.__wrapped__=value;this.__actions__=[];this.__dir__=1;this.__filtered__=false;this.__iteratees__=[];this.__takeCount__=POSITIVE_INFINITY;this.__views__=[]}function lazyClone(){var result=new LazyWrapper(this.__wrapped__);result.__actions__=arrayCopy(this.__actions__);result.__dir__=this.__dir__;result.__filtered__=this.__filtered__;result.__iteratees__=arrayCopy(this.__iteratees__);result.__takeCount__=this.__takeCount__;result.__views__=arrayCopy(this.__views__);return result}function lazyReverse(){if(this.__filtered__){var result=new LazyWrapper(this);result.__dir__=-1;result.__filtered__=true}else{result=this.clone();result.__dir__*=-1}return result}function lazyValue(){var array=this.__wrapped__.value(),dir=this.__dir__,isArr=isArray(array),isRight=dir<0,arrLength=isArr?array.length:0,view=getView(0,arrLength,this.__views__),start=view.start,end=view.end,length=end-start,index=isRight?end:start-1,iteratees=this.__iteratees__,iterLength=iteratees.length,resIndex=0,takeCount=nativeMin(length,this.__takeCount__);if(!isArr||arrLength=LARGE_ARRAY_SIZE?createCache(values):null,valuesLength=values.length;if(cache){indexOf=cacheIndexOf;isCommon=false;values=cache}outer:while(++indexlength?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end>>>0;start>>>=0;while(startlength?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end-start>>>0;start>>>=0;var result=Array(length);while(++index=LARGE_ARRAY_SIZE,seen=isLarge?createCache():null,result=[];if(seen){indexOf=cacheIndexOf;isCommon=false}else{isLarge=false;seen=iteratee?[]:result}outer:while(++index>>1,computed=array[mid];if((retHighest?computed<=value:computed2?sources[length-2]:undefined,guard=length>2?sources[2]:undefined,thisArg=length>1?sources[length-1]:undefined;if(typeof customizer=="function"){customizer=bindCallback(customizer,thisArg,5);length-=2}else{customizer=typeof thisArg=="function"?thisArg:undefined;length-=customizer?1:0}if(guard&&isIterateeCall(sources[0],sources[1],guard)){customizer=length<3?undefined:customizer;length=1}while(++index-1?collection[index]:undefined}return baseFind(collection,predicate,eachFunc)}}function createFindIndex(fromRight){return function(array,predicate,thisArg){if(!(array&&array.length)){return-1}predicate=getCallback(predicate,thisArg,3);return baseFindIndex(array,predicate,fromRight)}}function createFindKey(objectFunc){return function(object,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);return baseFind(object,predicate,objectFunc,true)}}function createFlow(fromRight){return function(){var wrapper,length=arguments.length,index=fromRight?length:-1,leftIndex=0,funcs=Array(length);while(fromRight?index--:++index=LARGE_ARRAY_SIZE){return wrapper.plant(value).value()}var index=0,result=length?funcs[index].apply(this,args):value;while(++index=length||!nativeIsFinite(length)){return""}var padLength=length-strLength;chars=chars==null?" ":chars+"";return repeat(chars,nativeCeil(padLength/chars.length)).slice(0,padLength)}function createPartialWrapper(func,bitmask,thisArg,partials){var isBind=bitmask&BIND_FLAG,Ctor=createCtorWrapper(func);function wrapper(){var argsIndex=-1,argsLength=arguments.length,leftIndex=-1,leftLength=partials.length,args=Array(leftLength+argsLength);while(++leftIndexarrLength)){return false}while(++index-1&&value%1==0&&value-1&&value%1==0&&value<=MAX_SAFE_INTEGER}function isStrictComparable(value){return value===value&&!isObject(value)}function mergeData(data,source){var bitmask=data[1],srcBitmask=source[1],newBitmask=bitmask|srcBitmask,isCommon=newBitmask0){if(++count>=HOT_COUNT){return key}}else{count=0}return baseSetData(key,value)}}();function shimKeys(object){var props=keysIn(object),propsLength=props.length,length=propsLength&&object.length;var allowIndexes=!!length&&isLength(length)&&(isArray(object)||isArguments(object));var index=-1,result=[];while(++index=120?createCache(othIndex&&value):null}var array=arrays[0],index=-1,length=array?array.length:0,seen=caches[0];outer:while(++index-1){splice.call(array,fromIndex,1)}}return array}var pullAt=restParam(function(array,indexes){indexes=baseFlatten(indexes);var result=baseAt(array,indexes);basePullAt(array,indexes.sort(baseCompareAscending));return result});function remove(array,predicate,thisArg){var result=[];if(!(array&&array.length)){return result}var index=-1,indexes=[],length=array.length;predicate=getCallback(predicate,thisArg,3);while(++index2?arrays[length-2]:undefined,thisArg=length>1?arrays[length-1]:undefined;if(length>2&&typeof iteratee=="function"){length-=2}else{iteratee=length>1&&typeof thisArg=="function"?(--length,thisArg):undefined;thisArg=undefined}arrays.length=length;return unzipWith(arrays,iteratee,thisArg)});function chain(value){var result=lodash(value);result.__chain__=true;return result}function tap(value,interceptor,thisArg){interceptor.call(thisArg,value);return value}function thru(value,interceptor,thisArg){return interceptor.call(thisArg,value)}function wrapperChain(){return chain(this)}function wrapperCommit(){return new LodashWrapper(this.value(),this.__chain__)}var wrapperConcat=restParam(function(values){values=baseFlatten(values);return this.thru(function(array){return arrayConcat(isArray(array)?array:[toObject(array)],values)})});function wrapperPlant(value){var result,parent=this;while(parent instanceof baseLodash){var clone=wrapperClone(parent);if(result){previous.__wrapped__=clone}else{result=clone}var previous=clone;parent=parent.__wrapped__}previous.__wrapped__=value;return result}function wrapperReverse(){var value=this.__wrapped__;var interceptor=function(value){return wrapped&&wrapped.__dir__<0?value:value.reverse()};if(value instanceof LazyWrapper){var wrapped=value;if(this.__actions__.length){wrapped=new LazyWrapper(this)}wrapped=wrapped.reverse();wrapped.__actions__.push({func:thru,args:[interceptor],thisArg:undefined});return new LodashWrapper(wrapped,this.__chain__)}return this.thru(interceptor)}function wrapperToString(){return this.value()+""}function wrapperValue(){return baseWrapperValue(this.__wrapped__,this.__actions__)}var at=restParam(function(collection,props){return baseAt(collection,baseFlatten(props))});var countBy=createAggregator(function(result,value,key){hasOwnProperty.call(result,key)?++result[key]:result[key]=1});function every(collection,predicate,thisArg){var func=isArray(collection)?arrayEvery:baseEvery;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function filter(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,predicate)}var find=createFind(baseEach);var findLast=createFind(baseEachRight,true);function findWhere(collection,source){return find(collection,baseMatches(source))}var forEach=createForEach(arrayEach,baseEach);var forEachRight=createForEach(arrayEachRight,baseEachRight); +var groupBy=createAggregator(function(result,value,key){if(hasOwnProperty.call(result,key)){result[key].push(value)}else{result[key]=[value]}});function includes(collection,target,fromIndex,guard){var length=collection?getLength(collection):0;if(!isLength(length)){collection=values(collection);length=collection.length}if(typeof fromIndex!="number"||guard&&isIterateeCall(target,fromIndex,guard)){fromIndex=0}else{fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex||0}return typeof collection=="string"||!isArray(collection)&&isString(collection)?fromIndex<=length&&collection.indexOf(target,fromIndex)>-1:!!length&&getIndexOf(collection,target,fromIndex)>-1}var indexBy=createAggregator(function(result,value,key){result[key]=value});var invoke=restParam(function(collection,path,args){var index=-1,isFunc=typeof path=="function",isProp=isKey(path),result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value){var func=isFunc?path:isProp&&value!=null?value[path]:undefined;result[++index]=func?func.apply(value,args):invokePath(value,path,args)});return result});function map(collection,iteratee,thisArg){var func=isArray(collection)?arrayMap:baseMap;iteratee=getCallback(iteratee,thisArg,3);return func(collection,iteratee)}var partition=createAggregator(function(result,value,key){result[key?0:1].push(value)},function(){return[[],[]]});function pluck(collection,path){return map(collection,property(path))}var reduce=createReduce(arrayReduce,baseEach);var reduceRight=createReduce(arrayReduceRight,baseEachRight);function reject(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,function(value,index,collection){return!predicate(value,index,collection)})}function sample(collection,n,guard){if(guard?isIterateeCall(collection,n,guard):n==null){collection=toIterable(collection);var length=collection.length;return length>0?collection[baseRandom(0,length-1)]:undefined}var index=-1,result=toArray(collection),length=result.length,lastIndex=length-1;n=nativeMin(n<0?0:+n||0,length);while(++index0){result=func.apply(this,arguments)}if(n<=1){func=undefined}return result}}var bind=restParam(function(func,thisArg,partials){var bitmask=BIND_FLAG;if(partials.length){var holders=replaceHolders(partials,bind.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(func,bitmask,thisArg,partials,holders)});var bindAll=restParam(function(object,methodNames){methodNames=methodNames.length?baseFlatten(methodNames):functions(object);var index=-1,length=methodNames.length;while(++indexwait){complete(trailingCall,maxTimeoutId)}else{timeoutId=setTimeout(delayed,remaining)}}function maxDelayed(){complete(trailing,timeoutId)}function debounced(){args=arguments;stamp=now();thisArg=this;trailingCall=trailing&&(timeoutId||!leading);if(maxWait===false){var leadingCall=leading&&!timeoutId}else{if(!maxTimeoutId&&!leading){lastCalled=stamp}var remaining=maxWait-(stamp-lastCalled),isCalled=remaining<=0||remaining>maxWait;if(isCalled){if(maxTimeoutId){maxTimeoutId=clearTimeout(maxTimeoutId)}lastCalled=stamp;result=func.apply(thisArg,args)}else if(!maxTimeoutId){maxTimeoutId=setTimeout(maxDelayed,remaining)}}if(isCalled&&timeoutId){timeoutId=clearTimeout(timeoutId)}else if(!timeoutId&&wait!==maxWait){timeoutId=setTimeout(delayed,wait)}if(leadingCall){isCalled=true;result=func.apply(thisArg,args)}if(isCalled&&!timeoutId&&!maxTimeoutId){args=thisArg=undefined}return result}debounced.cancel=cancel;return debounced}var defer=restParam(function(func,args){return baseDelay(func,1,args)});var delay=restParam(function(func,wait,args){return baseDelay(func,wait,args)});var flow=createFlow();var flowRight=createFlow(true);function memoize(func,resolver){if(typeof func!="function"||resolver&&typeof resolver!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var memoized=function(){var args=arguments,key=resolver?resolver.apply(this,args):args[0],cache=memoized.cache;if(cache.has(key)){return cache.get(key)}var result=func.apply(this,args);memoized.cache=cache.set(key,result);return result};memoized.cache=new memoize.Cache;return memoized}var modArgs=restParam(function(func,transforms){transforms=baseFlatten(transforms);if(typeof func!="function"||!arrayEvery(transforms,baseIsFunction)){throw new TypeError(FUNC_ERROR_TEXT)}var length=transforms.length;return restParam(function(args){var index=nativeMin(args.length,length);while(index--){args[index]=transforms[index](args[index])}return func.apply(this,args)})});function negate(predicate){if(typeof predicate!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(){return!predicate.apply(this,arguments)}}function once(func){return before(2,func)}var partial=createPartial(PARTIAL_FLAG);var partialRight=createPartial(PARTIAL_RIGHT_FLAG);var rearg=restParam(function(func,indexes){return createWrapper(func,REARG_FLAG,undefined,undefined,undefined,baseFlatten(indexes))});function restParam(func,start){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}start=nativeMax(start===undefined?func.length-1:+start||0,0);return function(){var args=arguments,index=-1,length=nativeMax(args.length-start,0),rest=Array(length);while(++indexother}function gte(value,other){return value>=other}function isArguments(value){return isObjectLike(value)&&isArrayLike(value)&&hasOwnProperty.call(value,"callee")&&!propertyIsEnumerable.call(value,"callee")}var isArray=nativeIsArray||function(value){return isObjectLike(value)&&isLength(value.length)&&objToString.call(value)==arrayTag};function isBoolean(value){return value===true||value===false||isObjectLike(value)&&objToString.call(value)==boolTag}function isDate(value){return isObjectLike(value)&&objToString.call(value)==dateTag}function isElement(value){return!!value&&value.nodeType===1&&isObjectLike(value)&&!isPlainObject(value)}function isEmpty(value){if(value==null){return true}if(isArrayLike(value)&&(isArray(value)||isString(value)||isArguments(value)||isObjectLike(value)&&isFunction(value.splice))){return!value.length}return!keys(value).length}function isEqual(value,other,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;var result=customizer?customizer(value,other):undefined;return result===undefined?baseIsEqual(value,other,customizer):!!result}function isError(value){return isObjectLike(value)&&typeof value.message=="string"&&objToString.call(value)==errorTag}function isFinite(value){return typeof value=="number"&&nativeIsFinite(value)}function isFunction(value){return isObject(value)&&objToString.call(value)==funcTag}function isObject(value){var type=typeof value;return!!value&&(type=="object"||type=="function")}function isMatch(object,source,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;return baseIsMatch(object,getMatchData(source),customizer)}function isNaN(value){return isNumber(value)&&value!=+value}function isNative(value){if(value==null){return false}if(isFunction(value)){return reIsNative.test(fnToString.call(value))}return isObjectLike(value)&&reIsHostCtor.test(value)}function isNull(value){return value===null}function isNumber(value){return typeof value=="number"||isObjectLike(value)&&objToString.call(value)==numberTag}function isPlainObject(value){var Ctor;if(!(isObjectLike(value)&&objToString.call(value)==objectTag&&!isArguments(value))||!hasOwnProperty.call(value,"constructor")&&(Ctor=value.constructor,typeof Ctor=="function"&&!(Ctor instanceof Ctor))){return false}var result;baseForIn(value,function(subValue,key){result=key});return result===undefined||hasOwnProperty.call(value,result)}function isRegExp(value){return isObject(value)&&objToString.call(value)==regexpTag}function isString(value){return typeof value=="string"||isObjectLike(value)&&objToString.call(value)==stringTag}function isTypedArray(value){return isObjectLike(value)&&isLength(value.length)&&!!typedArrayTags[objToString.call(value)]}function isUndefined(value){return value===undefined}function lt(value,other){return value0;while(++index=nativeMin(start,end)&&value=0&&string.indexOf(target,position)==position}function escape(string){string=baseToString(string);return string&&reHasUnescapedHtml.test(string)?string.replace(reUnescapedHtml,escapeHtmlChar):string}function escapeRegExp(string){string=baseToString(string);return string&&reHasRegExpChars.test(string)?string.replace(reRegExpChars,escapeRegExpChar):string||"(?:)"}var kebabCase=createCompounder(function(result,word,index){return result+(index?"-":"")+word.toLowerCase()});function pad(string,length,chars){string=baseToString(string);length=+length;var strLength=string.length;if(strLength>=length||!nativeIsFinite(length)){return string}var mid=(length-strLength)/2,leftLength=nativeFloor(mid),rightLength=nativeCeil(mid);chars=createPadding("",rightLength,chars);return chars.slice(0,leftLength)+string+chars}var padLeft=createPadDir();var padRight=createPadDir(true);function parseInt(string,radix,guard){if(guard?isIterateeCall(string,radix,guard):radix==null){radix=0}else if(radix){radix=+radix}string=trim(string);return nativeParseInt(string,radix||(reHasHexPrefix.test(string)?16:10))}function repeat(string,n){var result="";string=baseToString(string);n=+n;if(n<1||!string||!nativeIsFinite(n)){return result}do{if(n%2){result+=string}n=nativeFloor(n/2);string+=string}while(n);return result}var snakeCase=createCompounder(function(result,word,index){return result+(index?"_":"")+word.toLowerCase()});var startCase=createCompounder(function(result,word,index){return result+(index?" ":"")+(word.charAt(0).toUpperCase()+word.slice(1))});function startsWith(string,target,position){string=baseToString(string);position=position==null?0:nativeMin(position<0?0:+position||0,string.length);return string.lastIndexOf(target,position)==position}function template(string,options,otherOptions){var settings=lodash.templateSettings;if(otherOptions&&isIterateeCall(string,options,otherOptions)){options=otherOptions=undefined}string=baseToString(string);options=assignWith(baseAssign({},otherOptions||options),settings,assignOwnDefaults);var imports=assignWith(baseAssign({},options.imports),settings.imports,assignOwnDefaults),importsKeys=keys(imports),importsValues=baseValues(imports,importsKeys);var isEscaping,isEvaluating,index=0,interpolate=options.interpolate||reNoMatch,source="__p += '";var reDelimiters=RegExp((options.escape||reNoMatch).source+"|"+interpolate.source+"|"+(interpolate===reInterpolate?reEsTemplate:reNoMatch).source+"|"+(options.evaluate||reNoMatch).source+"|$","g");var sourceURL="//# sourceURL="+("sourceURL"in options?options.sourceURL:"lodash.templateSources["+ ++templateCounter+"]")+"\n";string.replace(reDelimiters,function(match,escapeValue,interpolateValue,esTemplateValue,evaluateValue,offset){interpolateValue||(interpolateValue=esTemplateValue);source+=string.slice(index,offset).replace(reUnescapedString,escapeStringChar);if(escapeValue){isEscaping=true;source+="' +\n__e("+escapeValue+") +\n'"}if(evaluateValue){isEvaluating=true;source+="';\n"+evaluateValue+";\n__p += '"}if(interpolateValue){source+="' +\n((__t = ("+interpolateValue+")) == null ? '' : __t) +\n'"}index=offset+match.length;return match});source+="';\n";var variable=options.variable;if(!variable){source="with (obj) {\n"+source+"\n}\n"}source=(isEvaluating?source.replace(reEmptyStringLeading,""):source).replace(reEmptyStringMiddle,"$1").replace(reEmptyStringTrailing,"$1;");source="function("+(variable||"obj")+") {\n"+(variable?"":"obj || (obj = {});\n")+"var __t, __p = ''"+(isEscaping?", __e = _.escape":"")+(isEvaluating?", __j = Array.prototype.join;\n"+"function print() { __p += __j.call(arguments, '') }\n":";\n")+source+"return __p\n}";var result=attempt(function(){return Function(importsKeys,sourceURL+"return "+source).apply(undefined,importsValues)});result.source=source;if(isError(result)){throw result}return result}function trim(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string),trimmedRightIndex(string)+1)}chars=chars+"";return string.slice(charsLeftIndex(string,chars),charsRightIndex(string,chars)+1)}function trimLeft(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string))}return string.slice(charsLeftIndex(string,chars+""))}function trimRight(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(0,trimmedRightIndex(string)+1)}return string.slice(0,charsRightIndex(string,chars+"")+1)}function trunc(string,options,guard){if(guard&&isIterateeCall(string,options,guard)){options=undefined}var length=DEFAULT_TRUNC_LENGTH,omission=DEFAULT_TRUNC_OMISSION;if(options!=null){if(isObject(options)){var separator="separator"in options?options.separator:separator;length="length"in options?+options.length||0:length;omission="omission"in options?baseToString(options.omission):omission}else{length=+options||0}}string=baseToString(string);if(length>=string.length){return string}var end=length-omission.length;if(end<1){return omission}var result=string.slice(0,end);if(separator==null){return result+omission}if(isRegExp(separator)){if(string.slice(end).search(separator)){var match,newEnd,substring=string.slice(0,end);if(!separator.global){separator=RegExp(separator.source,(reFlags.exec(separator)||"")+"g")}separator.lastIndex=0;while(match=separator.exec(substring)){newEnd=match.index}result=result.slice(0,newEnd==null?end:newEnd)}}else if(string.indexOf(separator,end)!=end){var index=result.lastIndexOf(separator);if(index>-1){result=result.slice(0,index)}}return result+omission}function unescape(string){string=baseToString(string);return string&&reHasEscapedHtml.test(string)?string.replace(reEscapedHtml,unescapeHtmlChar):string}function words(string,pattern,guard){if(guard&&isIterateeCall(string,pattern,guard)){pattern=undefined}string=baseToString(string);return string.match(pattern||reWords)||[]}var attempt=restParam(function(func,args){try{return func.apply(undefined,args)}catch(e){return isError(e)?e:new Error(e)}});function callback(func,thisArg,guard){if(guard&&isIterateeCall(func,thisArg,guard)){thisArg=undefined}return isObjectLike(func)?matches(func):baseCallback(func,thisArg)}function constant(value){return function(){return value}}function identity(value){return value}function matches(source){return baseMatches(baseClone(source,true))}function matchesProperty(path,srcValue){return baseMatchesProperty(path,baseClone(srcValue,true))}var method=restParam(function(path,args){return function(object){return invokePath(object,path,args)}});var methodOf=restParam(function(object,args){return function(path){return invokePath(object,path,args)}});function mixin(object,source,options){if(options==null){var isObj=isObject(source),props=isObj?keys(source):undefined,methodNames=props&&props.length?baseFunctions(source,props):undefined;if(!(methodNames?methodNames.length:isObj)){methodNames=false;options=source;source=object;object=this}}if(!methodNames){methodNames=baseFunctions(source,keys(source))}var chain=true,index=-1,isFunc=isFunction(object),length=methodNames.length;if(options===false){chain=false}else if(isObject(options)&&"chain"in options){chain=options.chain}while(++index0||end<0)){return new LazyWrapper(result)}if(start<0){result=result.takeRight(-start)}else if(start){result=result.drop(start)}if(end!==undefined){end=+end||0;result=end<0?result.dropRight(-end):result.take(end-start)}return result};LazyWrapper.prototype.takeRightWhile=function(predicate,thisArg){return this.reverse().takeWhile(predicate,thisArg).reverse()};LazyWrapper.prototype.toArray=function(){return this.take(POSITIVE_INFINITY)};baseForOwn(LazyWrapper.prototype,function(func,methodName){var checkIteratee=/^(?:filter|map|reject)|While$/.test(methodName),retUnwrapped=/^(?:first|last)$/.test(methodName),lodashFunc=lodash[retUnwrapped?"take"+(methodName=="last"?"Right":""):methodName];if(!lodashFunc){return}lodash.prototype[methodName]=function(){var args=retUnwrapped?[1]:arguments,chainAll=this.__chain__,value=this.__wrapped__,isHybrid=!!this.__actions__.length,isLazy=value instanceof LazyWrapper,iteratee=args[0],useLazy=isLazy||isArray(value);if(useLazy&&checkIteratee&&typeof iteratee=="function"&&iteratee.length!=1){isLazy=useLazy=false}var interceptor=function(value){return retUnwrapped&&chainAll?lodashFunc(value,1)[0]:lodashFunc.apply(undefined,arrayPush([value],args))};var action={func:thru,args:[interceptor],thisArg:undefined},onlyLazy=isLazy&&!isHybrid;if(retUnwrapped&&!chainAll){if(onlyLazy){value=value.clone();value.__actions__.push(action);return func.call(value)}return lodashFunc.call(undefined,this.value())[0]}if(!retUnwrapped&&useLazy){value=onlyLazy?value:new LazyWrapper(this);var result=func.apply(value,args);result.__actions__.push(action);return new LodashWrapper(result,chainAll)}return this.thru(interceptor)}});arrayEach(["join","pop","push","replace","shift","sort","splice","split","unshift"],function(methodName){var func=(/^(?:replace|split)$/.test(methodName)?stringProto:arrayProto)[methodName],chainName=/^(?:push|sort|unshift)$/.test(methodName)?"tap":"thru",retUnwrapped=/^(?:join|pop|replace|shift)$/.test(methodName);lodash.prototype[methodName]=function(){var args=arguments;if(retUnwrapped&&!this.__chain__){return func.apply(this.value(),args)}return this[chainName](function(value){return func.apply(value,args)})}});baseForOwn(LazyWrapper.prototype,function(func,methodName){var lodashFunc=lodash[methodName];if(lodashFunc){var key=lodashFunc.name,names=realNames[key]||(realNames[key]=[]);names.push({name:methodName,func:lodashFunc})}});realNames[createHybridWrapper(undefined,BIND_KEY_FLAG).name]=[{name:"wrapper",func:undefined}];LazyWrapper.prototype.clone=lazyClone;LazyWrapper.prototype.reverse=lazyReverse;LazyWrapper.prototype.value=lazyValue;lodash.prototype.chain=wrapperChain;lodash.prototype.commit=wrapperCommit;lodash.prototype.concat=wrapperConcat;lodash.prototype.plant=wrapperPlant;lodash.prototype.reverse=wrapperReverse;lodash.prototype.toString=wrapperToString;lodash.prototype.run=lodash.prototype.toJSON=lodash.prototype.valueOf=lodash.prototype.value=wrapperValue;lodash.prototype.collect=lodash.prototype.map;lodash.prototype.head=lodash.prototype.first;lodash.prototype.select=lodash.prototype.filter;lodash.prototype.tail=lodash.prototype.rest;return lodash}var _=runInContext();if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){root._=_;define(function(){return _})}else if(freeExports&&freeModule){if(moduleExports){(freeModule.exports=_)._=_}else{freeExports._=_}}else{root._=_}}).call(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],3:[function(require,module,exports){(function(window,document,undefined){var _MAP={8:"backspace",9:"tab",13:"enter",16:"shift",17:"ctrl",18:"alt",20:"capslock",27:"esc",32:"space",33:"pageup",34:"pagedown",35:"end",36:"home",37:"left",38:"up",39:"right",40:"down",45:"ins",46:"del",91:"meta",93:"meta",224:"meta"};var _KEYCODE_MAP={106:"*",107:"+",109:"-",110:".",111:"/",186:";",187:"=",188:",",189:"-",190:".",191:"/",192:"`",219:"[",220:"\\",221:"]",222:"'"};var _SHIFT_MAP={"~":"`","!":"1","@":"2","#":"3",$:"4","%":"5","^":"6","&":"7","*":"8","(":"9",")":"0",_:"-","+":"=",":":";",'"':"'","<":",",">":".","?":"/","|":"\\"};var _SPECIAL_ALIASES={option:"alt",command:"meta","return":"enter",escape:"esc",plus:"+",mod:/Mac|iPod|iPhone|iPad/.test(navigator.platform)?"meta":"ctrl"};var _REVERSE_MAP;for(var i=1;i<20;++i){_MAP[111+i]="f"+i}for(i=0;i<=9;++i){_MAP[i+96]=i}function _addEvent(object,type,callback){if(object.addEventListener){object.addEventListener(type,callback,false);return}object.attachEvent("on"+type,callback)}function _characterFromEvent(e){if(e.type=="keypress"){var character=String.fromCharCode(e.which);if(!e.shiftKey){character=character.toLowerCase()}return character}if(_MAP[e.which]){return _MAP[e.which]}if(_KEYCODE_MAP[e.which]){return _KEYCODE_MAP[e.which]}return String.fromCharCode(e.which).toLowerCase()}function _modifiersMatch(modifiers1,modifiers2){return modifiers1.sort().join(",")===modifiers2.sort().join(",")}function _eventModifiers(e){var modifiers=[];if(e.shiftKey){modifiers.push("shift")}if(e.altKey){modifiers.push("alt")}if(e.ctrlKey){modifiers.push("ctrl")}if(e.metaKey){modifiers.push("meta")}return modifiers}function _preventDefault(e){if(e.preventDefault){e.preventDefault();return}e.returnValue=false}function _stopPropagation(e){if(e.stopPropagation){e.stopPropagation();return}e.cancelBubble=true}function _isModifier(key){return key=="shift"||key=="ctrl"||key=="alt"||key=="meta"}function _getReverseMap(){if(!_REVERSE_MAP){_REVERSE_MAP={};for(var key in _MAP){if(key>95&&key<112){continue}if(_MAP.hasOwnProperty(key)){_REVERSE_MAP[_MAP[key]]=key}}}return _REVERSE_MAP}function _pickBestAction(key,modifiers,action){if(!action){action=_getReverseMap()[key]?"keydown":"keypress"}if(action=="keypress"&&modifiers.length){action="keydown"}return action}function _keysFromString(combination){if(combination==="+"){return["+"]}combination=combination.replace(/\+{2}/g,"+plus");return combination.split("+")}function _getKeyInfo(combination,action){var keys;var key;var i;var modifiers=[];keys=_keysFromString(combination);for(i=0;i1){_bindSequence(combination,sequence,callback,action);return}info=_getKeyInfo(combination,action);self._callbacks[info.key]=self._callbacks[info.key]||[];_getMatches(info.key,info.modifiers,{type:info.action},sequenceName,combination,level);self._callbacks[info.key][sequenceName?"unshift":"push"]({callback:callback,modifiers:info.modifiers,action:info.action,seq:sequenceName,level:level,combo:combination})}self._bindMultiple=function(combinations,callback,action){for(var i=0;i-1){return false}if(_belongsTo(element,self.target)){return false}return element.tagName=="INPUT"||element.tagName=="SELECT"||element.tagName=="TEXTAREA"||element.isContentEditable};Mousetrap.prototype.handleKey=function(){var self=this;return self._handleKey.apply(self,arguments)};Mousetrap.init=function(){var documentMousetrap=Mousetrap(document);for(var method in documentMousetrap){if(method.charAt(0)!=="_"){Mousetrap[method]=function(method){return function(){return documentMousetrap[method].apply(documentMousetrap,arguments)}}(method)}}};Mousetrap.init();window.Mousetrap=Mousetrap;if(typeof module!=="undefined"&&module.exports){module.exports=Mousetrap}if(typeof define==="function"&&define.amd){define(function(){return Mousetrap})}})(window,document)},{}],4:[function(require,module,exports){(function(process){function normalizeArray(parts,allowAboveRoot){var up=0;for(var i=parts.length-1;i>=0;i--){var last=parts[i];if(last==="."){parts.splice(i,1)}else if(last===".."){parts.splice(i,1);up++}else if(up){parts.splice(i,1);up--}}if(allowAboveRoot){for(;up--;up){parts.unshift("..")}}return parts}var splitPathRe=/^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/;var splitPath=function(filename){return splitPathRe.exec(filename).slice(1)};exports.resolve=function(){var resolvedPath="",resolvedAbsolute=false;for(var i=arguments.length-1;i>=-1&&!resolvedAbsolute;i--){var path=i>=0?arguments[i]:process.cwd();if(typeof path!=="string"){throw new TypeError("Arguments to path.resolve must be strings")}else if(!path){continue}resolvedPath=path+"/"+resolvedPath;resolvedAbsolute=path.charAt(0)==="/"}resolvedPath=normalizeArray(filter(resolvedPath.split("/"),function(p){return!!p}),!resolvedAbsolute).join("/");return(resolvedAbsolute?"/":"")+resolvedPath||"."};exports.normalize=function(path){var isAbsolute=exports.isAbsolute(path),trailingSlash=substr(path,-1)==="/";path=normalizeArray(filter(path.split("/"),function(p){return!!p}),!isAbsolute).join("/");if(!path&&!isAbsolute){path="."}if(path&&trailingSlash){path+="/"}return(isAbsolute?"/":"")+path};exports.isAbsolute=function(path){return path.charAt(0)==="/"};exports.join=function(){var paths=Array.prototype.slice.call(arguments,0);return exports.normalize(filter(paths,function(p,index){if(typeof p!=="string"){throw new TypeError("Arguments to path.join must be strings")}return p}).join("/"))};exports.relative=function(from,to){from=exports.resolve(from).substr(1);to=exports.resolve(to).substr(1);function trim(arr){var start=0;for(;start=0;end--){if(arr[end]!=="")break}if(start>end)return[];return arr.slice(start,end-start+1)}var fromParts=trim(from.split("/"));var toParts=trim(to.split("/"));var length=Math.min(fromParts.length,toParts.length);var samePartsLength=length;for(var i=0;i1){for(var i=1;i= 0x80 (not a basic code point)","invalid-input":"Invalid input"},baseMinusTMin=base-tMin,floor=Math.floor,stringFromCharCode=String.fromCharCode,key;function error(type){throw RangeError(errors[type])}function map(array,fn){var length=array.length;var result=[];while(length--){result[length]=fn(array[length])}return result}function mapDomain(string,fn){var parts=string.split("@");var result="";if(parts.length>1){result=parts[0]+"@";string=parts[1]}string=string.replace(regexSeparators,".");var labels=string.split(".");var encoded=map(labels,fn).join(".");return result+encoded}function ucs2decode(string){var output=[],counter=0,length=string.length,value,extra;while(counter=55296&&value<=56319&&counter65535){value-=65536;output+=stringFromCharCode(value>>>10&1023|55296);value=56320|value&1023}output+=stringFromCharCode(value);return output}).join("")}function basicToDigit(codePoint){if(codePoint-48<10){return codePoint-22}if(codePoint-65<26){return codePoint-65}if(codePoint-97<26){return codePoint-97}return base}function digitToBasic(digit,flag){return digit+22+75*(digit<26)-((flag!=0)<<5)}function adapt(delta,numPoints,firstTime){var k=0;delta=firstTime?floor(delta/damp):delta>>1;delta+=floor(delta/numPoints);for(;delta>baseMinusTMin*tMax>>1;k+=base){delta=floor(delta/baseMinusTMin)}return floor(k+(baseMinusTMin+1)*delta/(delta+skew))}function decode(input){var output=[],inputLength=input.length,out,i=0,n=initialN,bias=initialBias,basic,j,index,oldi,w,k,digit,t,baseMinusT;basic=input.lastIndexOf(delimiter);if(basic<0){basic=0}for(j=0;j=128){error("not-basic")}output.push(input.charCodeAt(j))}for(index=basic>0?basic+1:0;index=inputLength){error("invalid-input")}digit=basicToDigit(input.charCodeAt(index++));if(digit>=base||digit>floor((maxInt-i)/w)){error("overflow")}i+=digit*w;t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(digitfloor(maxInt/baseMinusT)){error("overflow")}w*=baseMinusT}out=output.length+1;bias=adapt(i-oldi,out,oldi==0);if(floor(i/out)>maxInt-n){error("overflow")}n+=floor(i/out);i%=out;output.splice(i++,0,n)}return ucs2encode(output)}function encode(input){var n,delta,handledCPCount,basicLength,bias,j,m,q,k,t,currentValue,output=[],inputLength,handledCPCountPlusOne,baseMinusT,qMinusT;input=ucs2decode(input);inputLength=input.length;n=initialN;delta=0;bias=initialBias;for(j=0;j=n&¤tValuefloor((maxInt-delta)/handledCPCountPlusOne)){error("overflow")}delta+=(m-n)*handledCPCountPlusOne;n=m;for(j=0;jmaxInt){error("overflow")}if(currentValue==n){for(q=delta,k=base;;k+=base){t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(q0&&len>maxKeys){len=maxKeys}for(var i=0;i=0){kstr=x.substr(0,idx);vstr=x.substr(idx+1)}else{kstr=x;vstr=""}k=decodeURIComponent(kstr);v=decodeURIComponent(vstr);if(!hasOwnProperty(obj,k)){obj[k]=v}else if(isArray(obj[k])){obj[k].push(v)}else{obj[k]=[obj[k],v]}}return obj};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"}},{}],8:[function(require,module,exports){"use strict";var stringifyPrimitive=function(v){switch(typeof v){case"string":return v;case"boolean":return v?"true":"false";case"number":return isFinite(v)?v:"";default:return""}};module.exports=function(obj,sep,eq,name){sep=sep||"&";eq=eq||"=";if(obj===null){obj=undefined}if(typeof obj==="object"){return map(objectKeys(obj),function(k){var ks=encodeURIComponent(stringifyPrimitive(k))+eq;if(isArray(obj[k])){return map(obj[k],function(v){return ks+encodeURIComponent(stringifyPrimitive(v))}).join(sep)}else{return ks+encodeURIComponent(stringifyPrimitive(obj[k]))}}).join(sep)}if(!name)return"";return encodeURIComponent(stringifyPrimitive(name))+eq+encodeURIComponent(stringifyPrimitive(obj))};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"};function map(xs,f){if(xs.map)return xs.map(f);var res=[];for(var i=0;i",'"',"`"," ","\r","\n"," "],unwise=["{","}","|","\\","^","`"].concat(delims),autoEscape=["'"].concat(unwise),nonHostChars=["%","/","?",";","#"].concat(autoEscape),hostEndingChars=["/","?","#"],hostnameMaxLen=255,hostnamePartPattern=/^[a-z0-9A-Z_-]{0,63}$/,hostnamePartStart=/^([a-z0-9A-Z_-]{0,63})(.*)$/,unsafeProtocol={javascript:true,"javascript:":true},hostlessProtocol={javascript:true,"javascript:":true},slashedProtocol={http:true,https:true,ftp:true,gopher:true,file:true,"http:":true,"https:":true,"ftp:":true,"gopher:":true,"file:":true},querystring=require("querystring");function urlParse(url,parseQueryString,slashesDenoteHost){if(url&&isObject(url)&&url instanceof Url)return url;var u=new Url;u.parse(url,parseQueryString,slashesDenoteHost);return u}Url.prototype.parse=function(url,parseQueryString,slashesDenoteHost){if(!isString(url)){throw new TypeError("Parameter 'url' must be a string, not "+typeof url)}var rest=url;rest=rest.trim();var proto=protocolPattern.exec(rest);if(proto){proto=proto[0];var lowerProto=proto.toLowerCase();this.protocol=lowerProto;rest=rest.substr(proto.length)}if(slashesDenoteHost||proto||rest.match(/^\/\/[^@\/]+@[^@\/]+/)){var slashes=rest.substr(0,2)==="//";if(slashes&&!(proto&&hostlessProtocol[proto])){rest=rest.substr(2);this.slashes=true}}if(!hostlessProtocol[proto]&&(slashes||proto&&!slashedProtocol[proto])){var hostEnd=-1;for(var i=0;i127){newpart+="x"}else{newpart+=part[j]}}if(!newpart.match(hostnamePartPattern)){var validParts=hostparts.slice(0,i);var notHost=hostparts.slice(i+1);var bit=part.match(hostnamePartStart);if(bit){validParts.push(bit[1]);notHost.unshift(bit[2])}if(notHost.length){rest="/"+notHost.join(".")+rest}this.hostname=validParts.join(".");break}}}}if(this.hostname.length>hostnameMaxLen){this.hostname=""}else{this.hostname=this.hostname.toLowerCase()}if(!ipv6Hostname){var domainArray=this.hostname.split(".");var newOut=[];for(var i=0;i0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}result.search=relative.search;result.query=relative.query;if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.href=result.format();return result}if(!srcPath.length){result.pathname=null;if(result.search){result.path="/"+result.search}else{result.path=null}result.href=result.format();return result}var last=srcPath.slice(-1)[0];var hasTrailingSlash=(result.host||relative.host)&&(last==="."||last==="..")||last==="";var up=0;for(var i=srcPath.length;i>=0;i--){last=srcPath[i];if(last=="."){srcPath.splice(i,1)}else if(last===".."){srcPath.splice(i,1);up++}else if(up){srcPath.splice(i,1);up--}}if(!mustEndAbs&&!removeAllDots){for(;up--;up){srcPath.unshift("..")}}if(mustEndAbs&&srcPath[0]!==""&&(!srcPath[0]||srcPath[0].charAt(0)!=="/")){srcPath.unshift("")}if(hasTrailingSlash&&srcPath.join("/").substr(-1)!=="/"){srcPath.push("")}var isAbsolute=srcPath[0]===""||srcPath[0]&&srcPath[0].charAt(0)==="/";if(psychotic){result.hostname=result.host=isAbsolute?"":srcPath.length?srcPath.shift():"";var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}mustEndAbs=mustEndAbs||result.host&&srcPath.length;if(mustEndAbs&&!isAbsolute){srcPath.unshift("")}if(!srcPath.length){result.pathname=null;result.path=null}else{result.pathname=srcPath.join("/")}if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.auth=relative.auth||result.auth;result.slashes=result.slashes||relative.slashes;result.href=result.format();return result};Url.prototype.parseHost=function(){var host=this.host;var port=portPattern.exec(host);if(port){port=port[0];if(port!==":"){this.port=port.substr(1)}host=host.substr(0,host.length-port.length)}if(host)this.hostname=host};function isString(arg){return typeof arg==="string"}function isObject(arg){return typeof arg==="object"&&arg!==null}function isNull(arg){return arg===null}function isNullOrUndefined(arg){return arg==null}},{punycode:6,querystring:9}],11:[function(require,module,exports){var $=require("jquery");function toggleDropdown(e){var $dropdown=$(e.currentTarget).parent().find(".dropdown-menu");$dropdown.toggleClass("open");e.stopPropagation();e.preventDefault()}function closeDropdown(e){$(".dropdown-menu").removeClass("open")}function init(){$(document).on("click",".toggle-dropdown",toggleDropdown);$(document).on("click",".dropdown-menu",function(e){e.stopPropagation()});$(document).on("click",closeDropdown)}module.exports={init:init}},{jquery:1}],12:[function(require,module,exports){var $=require("jquery");module.exports=$({})},{jquery:1}],13:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var dropdown=require("./dropdown");var events=require("./events");var state=require("./state");var keyboard=require("./keyboard");var navigation=require("./navigation");var sidebar=require("./sidebar");var toolbar=require("./toolbar");function start(config){sidebar.init();keyboard.init();dropdown.init();navigation.init();toolbar.createButton({index:0,icon:"fa fa-align-justify",onClick:function(e){e.preventDefault();sidebar.toggle()}});events.trigger("start",config);navigation.notify()}var gitbook={start:start,events:events,state:state,toolbar:toolbar,sidebar:sidebar,storage:storage,keyboard:keyboard};var MODULES={gitbook:gitbook,jquery:$,lodash:_};window.gitbook=gitbook;window.$=$;window.jQuery=$;gitbook.require=function(mods,fn){mods=_.map(mods,function(mod){mod=mod.toLowerCase();if(!MODULES[mod]){throw new Error("GitBook module "+mod+" doesn't exist")}return MODULES[mod]});fn.apply(null,mods)};module.exports={}},{"./dropdown":11,"./events":12,"./keyboard":14,"./navigation":16,"./sidebar":18,"./state":19,"./storage":20,"./toolbar":21,jquery:1,lodash:2}],14:[function(require,module,exports){var Mousetrap=require("mousetrap");var navigation=require("./navigation");var sidebar=require("./sidebar");function bindShortcut(keys,fn){Mousetrap.bind(keys,function(e){fn();return false})}function init(){bindShortcut(["right"],function(e){navigation.goNext()});bindShortcut(["left"],function(e){navigation.goPrev()});bindShortcut(["s"],function(e){sidebar.toggle()})}module.exports={init:init,bind:bindShortcut}},{"./navigation":16,"./sidebar":18,mousetrap:3}],15:[function(require,module,exports){var state=require("./state");function showLoading(p){state.$book.addClass("is-loading");p.always(function(){state.$book.removeClass("is-loading")});return p}module.exports={show:showLoading}},{"./state":19}],16:[function(require,module,exports){var $=require("jquery");var url=require("url");var events=require("./events");var state=require("./state");var loading=require("./loading");var usePushState=typeof history.pushState!=="undefined";function handleNavigation(relativeUrl,push){var uri=url.resolve(window.location.pathname,relativeUrl);notifyPageChange();location.href=relativeUrl;return;return loading.show($.get(uri).done(function(html){if(push)history.pushState({path:uri},null,uri);html=html.replace(/<(\/?)(html|head|body)([^>]*)>/gi,function(a,b,c,d){return"<"+b+"div"+(b?"":' data-element="'+c+'"')+d+">"});var $page=$(html);var $pageHead=$page.find("[data-element=head]");var $pageBody=$page.find(".book");document.title=$pageHead.find("title").text();var $head=$("head");$head.find("link[rel=prev]").remove();$head.find("link[rel=next]").remove();$head.append($pageHead.find("link[rel=prev]"));$head.append($pageHead.find("link[rel=next]"));var bodyClass=$(".book").attr("class");var scrollPosition=$(".book-summary .summary").scrollTop();$pageBody.toggleClass("with-summary",$(".book").hasClass("with-summary"));$(".book").replaceWith($pageBody);$(".book").attr("class",bodyClass);$(".book-summary .summary").scrollTop(scrollPosition);state.update($("html"));preparePage()}).fail(function(e){location.href=relativeUrl}))}function updateNavigationPosition(){var bodyInnerWidth,pageWrapperWidth;bodyInnerWidth=parseInt($(".body-inner").css("width"),10);pageWrapperWidth=parseInt($(".page-wrapper").css("width"),10);$(".navigation-next").css("margin-right",bodyInnerWidth-pageWrapperWidth+"px")}function notifyPageChange(){events.trigger("page.change")}function preparePage(notify){var $bookBody=$(".book-body");var $bookInner=$bookBody.find(".body-inner");var $pageWrapper=$bookInner.find(".page-wrapper");updateNavigationPosition();$bookInner.scrollTop(0);$bookBody.scrollTop(0);if(notify!==false)notifyPageChange()}function isLeftClickEvent(e){return e.button===0}function isModifiedEvent(e){return!!(e.metaKey||e.altKey||e.ctrlKey||e.shiftKey)}function handlePagination(e){if(isModifiedEvent(e)||!isLeftClickEvent(e)){return}e.stopPropagation();e.preventDefault();var url=$(this).attr("href");if(url)handleNavigation(url,true)}function goNext(){var url=$(".navigation-next").attr("href");if(url)handleNavigation(url,true)}function goPrev(){var url=$(".navigation-prev").attr("href");if(url)handleNavigation(url,true)}function init(){$.ajaxSetup({});if(location.protocol!=="file:"){history.replaceState({path:window.location.href},"")}window.onpopstate=function(event){if(event.state===null){return}return handleNavigation(event.state.path,false)};$(document).on("click",".navigation-prev",handlePagination);$(document).on("click",".navigation-next",handlePagination);$(document).on("click",".summary [data-path] a",handlePagination);$(window).resize(updateNavigationPosition);preparePage(false)}module.exports={init:init,goNext:goNext,goPrev:goPrev,notify:notifyPageChange}},{"./events":12,"./loading":15,"./state":19,jquery:1,url:10}],17:[function(require,module,exports){module.exports={isMobile:function(){return document.body.clientWidth<=600}}},{}],18:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var platform=require("./platform");var state=require("./state");function toggleSidebar(_state,animation){if(state!=null&&isOpen()==_state)return;if(animation==null)animation=true;state.$book.toggleClass("without-animation",!animation);state.$book.toggleClass("with-summary",_state);storage.set("sidebar",isOpen())}function isOpen(){return state.$book.hasClass("with-summary")}function init(){if(platform.isMobile()){toggleSidebar(false,false)}else{toggleSidebar(storage.get("sidebar",true),false)}$(document).on("click",".book-summary li.chapter a",function(e){if(platform.isMobile())toggleSidebar(false,false)})}function filterSummary(paths){var $summary=$(".book-summary");$summary.find("li").each(function(){var path=$(this).data("path");var st=paths==null||_.contains(paths,path);$(this).toggle(st);if(st)$(this).parents("li").show()})}module.exports={init:init,isOpen:isOpen,toggle:toggleSidebar,filter:filterSummary}},{"./platform":17,"./state":19,"./storage":20,jquery:1,lodash:2}],19:[function(require,module,exports){var $=require("jquery");var url=require("url");var path=require("path");var state={};state.update=function(dom){var $book=$(dom.find(".book"));state.$book=$book;state.level=$book.data("level");state.basePath=$book.data("basepath");state.innerLanguage=$book.data("innerlanguage");state.revision=$book.data("revision");state.filepath=$book.data("filepath");state.chapterTitle=$book.data("chapter-title");state.root=url.resolve(location.protocol+"//"+location.host,path.dirname(path.resolve(location.pathname.replace(/\/$/,"/index.html"),state.basePath))).replace(/\/?$/,"/");state.bookRoot=state.innerLanguage?url.resolve(state.root,".."):state.root};state.update($);module.exports=state},{jquery:1,path:4,url:10}],20:[function(require,module,exports){var baseKey="";module.exports={setBaseKey:function(key){baseKey=key},set:function(key,value){key=baseKey+":"+key;try{localStorage[key]=JSON.stringify(value)}catch(e){}},get:function(key,def){key=baseKey+":"+key;if(localStorage[key]===undefined)return def;try{var v=JSON.parse(localStorage[key]);return v==null?def:v}catch(err){return localStorage[key]||def}},remove:function(key){key=baseKey+":"+key;localStorage.removeItem(key)}}},{}],21:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var events=require("./events");var buttons=[];function insertAt(parent,selector,index,element){var lastIndex=parent.children(selector).size();if(index<0){index=Math.max(0,lastIndex+1+index)}parent.append(element);if(index",{"class":"dropdown-menu",html:''});if(_.isString(dropdown)){$menu.append(dropdown)}else{var groups=_.map(dropdown,function(group){if(_.isArray(group))return group;else return[group]});_.each(groups,function(group){var $group=$("
      ",{"class":"buttons"});var sizeClass="size-"+group.length;_.each(group,function(btn){btn=_.defaults(btn||{},{text:"",className:"",onClick:defaultOnClick});var $btn=$("
      + + + + + + + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/9-confidence-intervals.html b/previous_versions/v0.4.0/9-confidence-intervals.html new file mode 100644 index 000000000..00a84b19c --- /dev/null +++ b/previous_versions/v0.4.0/9-confidence-intervals.html @@ -0,0 +1,1806 @@ + + + + + + + + 9 Confidence Intervals | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      9 Confidence Intervals

      +

      In Chapter 8, we explored the process of sampling from a representative sample to build a sampling distribution. The motivation there was to use multiple samples from the same population to visualize and attempt to understand the variability in the statistic from one sample to another. Furthermore, recall our concepts and terminology related to sampling from the beginning of Chapter 8:

      +

      Generally speaking, we learned that if the sampling of a sample of size \(n\) is done at random, then the resulting sample is unbiased and representative of the population, thus any result based on the sample can generalize to the population, and hence the point estimate/sample statistic computed from this sample is a “good guess” of the unknown population parameter of interest

      +

      Specific to the bowl, we learned that if we properly mix the balls first thereby ensuring the randomness of samples extracted using the shovel with \(n=50\) slots, then the contents of the shovel will “look like” the contents of the bowl, thus any results based on the sample of \(n=50\) balls can generalize to the large bowl of \(N=2400\) balls, and hence the sample proportion red \(\widehat{p}\) of the \(n=50\) balls in the shovel is a “good guess” of the true population proportion red \(p\) of the \(N=2400\) balls in the bowl.

      +

      We emphasize that we used a point estimate/sample statistic, in this case the sample proportion \(\widehat{p}\), to estimate the unknown value of the population parameter, in this case the population proportion \(p\). In other words, we are using the sample to infer about the population.

      +

      We can however consider inferential situations other than just those involving proportions. We present a wide array of such scenarios in Table ??. In all 7 cases, the point estimate/sample statistic estimates the unknown population parameter. It does so by computing summary statistics based on a sample of size \(n\).

      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      ScenarioPopulation parameterPopulation NotationPoint estimate/sample statisticSample Notation
      1Population proportion\(p\)Sample proportion\(\widehat{p}\)
      2Population mean\(\mu\)Sample mean\(\overline{x}\)
      3Difference in population proportions\(p_1 - p_2\)Difference in sample proportions\(\widehat{p}_1 - \widehat{p}_2\)
      4Difference in population means\(\mu_1 - \mu_2\)Difference in sample means\(\overline{x}_1 - \overline{x}_2\)
      5Population standard deviation\(\sigma\)Sample standard deviation\(s\)
      6Population regression intercept\(\beta_0\)Sample regression intercept\(\widehat{\beta}_0\) or \(b_0\)
      7Population regression slope\(\beta_1\)Sample regression slope\(\widehat{\beta}_1\) or \(b_1\)
      +

      We’ll cover the first four scenarios in this chapter on confidence intervals and the following one on hypothesis testing:

      +
        +
      • Scenario 2 about means. Ex: the average age of pennies.
      • +
      • Scenario 3 about differences in proportions between two groups. Ex: the difference in high school completion rates for Canadians vs non-Canadians. We call this a situation of two-sample inference.
      • +
      • Scenario 4 is similar to 3, but its about the means of two groups. Ex: the difference in average test scores for the morning section of a class versus the afternoon section of a class. This another situation of two-sample inference.
      • +
      +

      In contrast to these, Scenario 5 involves a measure of spread: the standard deviation. Does the spread/variability of a sample match the spread/variability of the population? However, we leave this topic for a more intermediate course on statistical inference.

      +

      In Chapter 11 on inference for regression, we’ll cover Scenarios 6 & 7 about the regression line. In particular we’ll see that the fitted regression line from Chapter 6 on basic regression, \(\widehat{y} = b_0 + b_1 \cdot x\), is in fact an estimate of some true population regression line \(y = \beta_0 + \beta+1 \cdot x\) based on a sample of \(n\) pairs of points \((x, y)\). Ex: Recall our sample of \(n=463\) instructors at the UT Austin from the evals data set in Chapter 6. Based on the results of the fitted regression model of teaching score with beauty score as an explanatory/predictor variable, what can we say about this relationship for all instructors, not just those at the UT Austin?

      +

      In most cases, we don’t have the population values as we did with the bowl of balls. We only have a single sample of data from a larger population. We’d like to be able to make some reasonable guesses about population parameters using that single sample to create a range of plausible values for a population parameter. This range of plausible values is known as a confidence interval and will be the focus of this chapter. And how do we use a single sample to get some idea of how other samples might vary in terms of their statistic values? One common way this is done is via a process known as bootstrapping that will be the focus of the beginning sections of this chapter.

      +
      +

      Needed packages

      +

      Let’s load all the packages needed for this chapter (this assumes you’ve already installed them). If needed, read Section 2.3 for information on how to install and load R packages.

      +
      library(dplyr)
      +library(ggplot2)
      +library(janitor)
      +library(moderndive)
      +library(infer)
      +
      +
      +

      DataCamp

      +

      Our approach of using data science tools to understand the first major component of statistical inference, confidence intervals, uses the same tools as in Mine Cetinkaya-Rundel and Andrew Bray’s DataCamp courses “Inference for Numerical Data” and “Inference for Categorical Data.” If you’re interested in complementing your learning below in an interactive online environment, click on the images below to access the courses.

      +
      +Drawing +Drawing +
      +
      +
      +

      9.1 Bootstrapping

      +
      +

      9.1.1 Data explanation

      +

      The moderndive package contains a sample of 40 pennies collected and minted in the United States. Let’s explore this sample data first:

      +
      pennies_sample
      +
      # A tibble: 40 x 2
      +    year age_in_2011
      +   <int>       <int>
      + 1  2005           6
      + 2  1981          30
      + 3  1977          34
      + 4  1992          19
      + 5  2005           6
      + 6  2006           5
      + 7  2000          11
      + 8  1992          19
      + 9  1988          23
      +10  1996          15
      +# … with 30 more rows
      +

      The pennies_sample data frame has rows corresponding to a single penny with two variables:

      +
        +
      • year of minting as shown on the penny and
      • +
      • age_in_2011 giving the years the penny had been in circulation from 2011 as an integer, e.g. 15, 2, etc.
      • +
      +

      Suppose we are interested in understanding some properties of the mean age of all US pennies from this data collected in 2011. How might we go about that? Let’s begin by understanding some of the properties of pennies_sample using data wrangling from Chapter 5 and data visualization from Chapter 3.

      +
      +
      +

      9.1.2 Exploratory data analysis

      +

      First, let’s visualize the values in this sample as a histogram:

      +
      ggplot(pennies_sample, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +

      We see a roughly symmetric distribution here that has quite a few values near 20 years in age with only a few larger than 40 years or smaller than 5 years. If pennies_sample is a representative sample from the population, we’d expect the age of all US pennies collected in 2011 to have a similar shape, a similar spread, and similar measures of central tendency like the mean.

      +

      So where does the mean value fall for this sample? This point will be known as our point estimate and provides us with a single number that could serve as the guess to what the true population mean age might be. Recall how to find this using the dplyr package:

      +
      x_bar <- pennies_sample %>% 
      +  summarize(stat = mean(age_in_2011))
      +x_bar
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      We’ve denoted this sample mean as \(\bar{x}\), which is the standard symbol for denoting the mean of a sample. Our point estimate is, thus, \(\bar{x} = 25.1\). Note that this is just one sample though providing just one guess at the population mean. What if we’d like to have another guess?

      +

      This should all sound similar to what we did in Chapter 8. There instead of collecting just a single scoop of balls we had many different students use the shovel to scoop different samples of red and white balls. We then calculated a sample statistic (the sample proportion) from each sample. But, we don’t have a population to pull from here with the pennies. We only have this one sample.

      +

      The process of bootstrapping allows us to use a single sample to generate many different samples that will act as our way of approximating a sampling distribution using a created bootstrap distribution instead. We will pull ourselves up from our bootstraps using a single sample (pennies_sample) to get an idea of the grander sampling distribution.

      +
      +
      +

      9.1.3 The Bootstrapping Process

      +

      Bootstrapping uses a process of sampling with replacement from our original sample to create new bootstrap samples of the same size as our original sample. We can again make use of the rep_sample_n() function to explore what one such bootstrap sample would look like. Remember that we are randomly sampling from the original sample here with replacement and that we always use the same sample size for the bootstrap samples as the size of the original sample (pennies_sample).

      +
      bootstrap_sample1 <- pennies_sample %>% 
      +  rep_sample_n(size = 40, replace = TRUE, reps = 1)
      +bootstrap_sample1
      +
      # A tibble: 40 x 3
      +# Groups:   replicate [1]
      +   replicate  year age_in_2011
      +       <int> <int>       <int>
      + 1         1  1983          28
      + 2         1  2000          11
      + 3         1  2004           7
      + 4         1  1981          30
      + 5         1  1993          18
      + 6         1  2006           5
      + 7         1  1981          30
      + 8         1  2004           7
      + 9         1  1992          19
      +10         1  1994          17
      +# … with 30 more rows
      +

      Let’s visualize what this new bootstrap sample looks like:

      +
      ggplot(bootstrap_sample1, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +

      We now have another sample from what we could assume comes from the population of interest. We can similarly calculate the sample mean of this bootstrap sample, called a bootstrap statistic.

      +
      bootstrap_sample1 %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 1 x 2
      +  replicate  stat
      +      <int> <dbl>
      +1         1  23.2
      +

      We can see that this sample mean is smaller than the x_bar value we calculated earlier for the pennies_sample data. We’ll come back to analyzing the different bootstrap statistic values shortly.

      +

      Let’s recap what was done to get to this bootstrap sample using a tactile explanation:

      +
        +
      1. First, pretend that each of the 40 values of age_in_2011 in pennies_sample were written on a small piece of paper. Recall that these values were 6, 30, 34, 19, 6, etc.
      2. +
      3. Now, put the 40 small pieces of paper into a receptacle such as a baseball cap.
      4. +
      5. Shake up the pieces of paper.
      6. +
      7. Draw “at random” from the cap to select one piece of paper.
      8. +
      9. Write down the value on this piece of paper. Say that it is 28.
      10. +
      11. Now, place this piece of paper containing 28 back into the cap.
      12. +
      13. Draw “at random” again from the cap to select a piece of paper. Note that this is the sampling with replacement part since you may draw 28 again.
      14. +
      15. Repeat this process until you have drawn 40 pieces of paper and written down the values on these 40 pieces of paper. Completing this repetition produces ONE bootstrap sample.
      16. +
      +

      If you look at the values in bootstrap_sample1, you can see how this process plays out. We originally drew 28, then we drew 11, then 7, and so on. Of course, we didn’t actually use pieces of paper and a cap here. We just had the computer perform this process for us to produce bootstrap_sample1 using rep_sample_n() with replace = TRUE set.

      +

      The process of sampling with replacement is how we can use the original sample to take a guess as to what other values in the population may be. Sometimes in these bootstrap samples, we will select lots of larger values from the original sample, sometimes we will select lots of smaller values, and most frequently we will select values that are near the center of the sample. Let’s explore what the distribution of values of age_in_2011 for six different bootstrap samples looks like to further understand this variability.

      +
      six_bootstrap_samples <- pennies_sample %>% 
      +  rep_sample_n(size = 40, replace = TRUE, reps = 6)
      +
      ggplot(six_bootstrap_samples, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white") +
      +  facet_wrap(~ replicate)
      +

      +

      We can also look at the six different means using dplyr syntax:

      +
      six_bootstrap_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 6 x 2
      +  replicate  stat
      +      <int> <dbl>
      +1         1  23.6
      +2         2  24.1
      +3         3  25.2
      +4         4  23.1
      +5         5  24.0
      +6         6  24.7
      +

      Instead of doing this six times, we could do it 1000 times and then look at the distribution of stat across all 1000 of the replicates. This sets the stage for the infer R package (Bray et al. 2019) that was created to help users perform statistical inference such as confidence intervals and hypothesis tests using verbs similar to what you’ve seen with dplyr. We’ll walk through setting up each of the infer verbs for confidence intervals using this pennies_sample example, while also explaining the purpose of the verbs in a general framework.

      +
      +
      +
      +

      9.2 The infer package for statistical inference

      +

      The infer package makes great use of the %>% to create a pipeline for statistical inference. The goal of the package is to provide a way for its users to explain the computational process of confidence intervals and hypothesis tests using the code as a guide. The verbs build in order here, so you’ll want to start with specify() and then continue through the others as needed.

      +
      +

      9.2.1 Specify variables

      +

      +

      The specify() function is used primarily to choose which variables will be the focus of the statistical inference. In addition, a setting of which variable will act as the explanatory and which acts as the response variable is done here. For proportion problems to those in Chapter 8, we can also give which of the different levels we would like to have as a success. We’ll see further examples of these options in this chapter, Chapter 10, and in Appendix B.

      +

      To begin to create a confidence interval for the population mean age of US pennies in 2011, we start by using specify() to choose which variable in our pennies_sample data we’d like to work with. This can be done in one of two ways:

      +
        +
      1. Using the response argument:
      2. +
      +
      pennies_sample %>% 
      +  specify(response = age_in_2011)
      +
      Response: age_in_2011 (integer)
      +# A tibble: 40 x 1
      +   age_in_2011
      +         <int>
      + 1           6
      + 2          30
      + 3          34
      + 4          19
      + 5           6
      + 6           5
      + 7          11
      + 8          19
      + 9          23
      +10          15
      +# … with 30 more rows
      +
        +
      1. Using formula notation:
      2. +
      +
      pennies_sample %>% 
      +  specify(formula = age_in_2011 ~ NULL)
      +
      Response: age_in_2011 (integer)
      +# A tibble: 40 x 1
      +   age_in_2011
      +         <int>
      + 1           6
      + 2          30
      + 3          34
      + 4          19
      + 5           6
      + 6           5
      + 7          11
      + 8          19
      + 9          23
      +10          15
      +# … with 30 more rows
      +

      Note that the formula notation uses the common R methodology to include the response \(y\) variable on the left of the ~ and the explanatory \(x\) variable on the right of the “tilde.” Recall that you used this notation frequently with the lm() function in Chapters 6 and 7 when fitting regression models. Either notation works just fine, but a preference is usually given here for the formula notation to further build on the ideas from earlier chapters.

      +
      +
      +

      9.2.2 Generate replicates

      +

      +

      After specify()ing the variables we’d like in our inferential analysis, we next feed that into the generate() verb. The generate() verb’s main argument is reps, which is used to give how many different repetitions one would like to perform. Another argument here is type, which is automatically determined by the kinds of variables passed into specify(). We can also be explicit and set this type to be type = "bootstrap". This type argument will be further used in hypothesis testing in Chapter 10 as well. Make sure to check out ?generate to see the options here and use the ? operator to better understand other verbs as well.

      +

      Let’s generate() 1000 bootstrap samples:

      +
      thousand_bootstrap_samples <- pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  generate(reps = 1000)
      +

      We can use the dplyr count() function to help us understand what the thousand_bootstrap_samples data frame looks like:

      +
      thousand_bootstrap_samples %>% count(replicate)
      +
      # A tibble: 1,000 x 2
      +# Groups:   replicate [1,000]
      +   replicate     n
      +       <int> <int>
      + 1         1    40
      + 2         2    40
      + 3         3    40
      + 4         4    40
      + 5         5    40
      + 6         6    40
      + 7         7    40
      + 8         8    40
      + 9         9    40
      +10        10    40
      +# … with 990 more rows
      +

      Notice that each replicate has 40 entries here. Now that we have 1000 different bootstrap samples, our next step is to calculate the bootstrap statistics for each sample.

      +
      +
      +

      9.2.3 Calculate summary statistics

      +

      +

      After generate()ing many different samples, we next want to condense those samples down into a single statistic for each replicated sample. As seen in the diagram, the calculate() function is helpful here.

      +

      As we did at the beginning of this chapter, we now want to calculate the mean age_in_2011 for each bootstrap sample. To do so, we use the stat argument and set it to "mean" below. The stat argument has a variety of different options here and we will see further examples of this throughout the remaining chapters.

      +
      bootstrap_distribution <- pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "mean")
      +bootstrap_distribution
      +
      # A tibble: 1,000 x 2
      +   replicate  stat
      +       <int> <dbl>
      + 1         1  26.5
      + 2         2  25.4
      + 3         3  26.0
      + 4         4  26  
      + 5         5  25.2
      + 6         6  29.0
      + 7         7  22.8
      + 8         8  26.4
      + 9         9  24.9
      +10        10  28.1
      +# … with 990 more rows
      +

      We see that the resulting data has 1000 rows and 2 columns corresponding to the 1000 replicates and the mean for each bootstrap sample.

      +
      +

      Observed statistic / point estimate calculations

      +

      Just as group_by() %>% summarize() produces a useful workflow in dplyr, we can also use specify() %>% calculate() to compute summary measures on our original sample data. It’s often helpful both in confidence interval calculations, but also in hypothesis testing to identify what the corresponding statistic is in the original data. For our example on penny age, we computed above a value of x_bar using the summarize() verb in dplyr:

      +
      pennies_sample %>% 
      +  summarize(stat = mean(age_in_2011))
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      This can also be done by skipping the generate() step in the pipeline feeding specify() directly into calculate():

      +
      pennies_sample %>% 
      +  specify(response = age_in_2011) %>% 
      +  calculate(stat = "mean")
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  25.1
      +

      This shortcut will be particularly useful when the calculation of the observed statistic is tricky to do using dplyr alone. This is particularly the case when working with more than one variable as will be seen in Chapter 10.

      +
      +
      +
      +

      9.2.4 Visualize the results

      +

      +

      The visualize() verb provides a simple way to view the bootstrap distribution as a histogram of the stat variable values. It has many other arguments that one can use as well including the shading of the histogram values corresponding to the confidence interval values.

      +
      bootstrap_distribution %>% visualize()
      +

      +

      The shape of this resulting distribution may look familiar to you. It resembles the well-known normal (bell-shaped) curve.

      +

      The following diagram recaps the infer pipeline for creating a bootstrap distribution.

      +

      +
      +
      +
      +

      9.3 Now to confidence intervals

      +

      Definition: Confidence Interval

      +

      A confidence interval gives a range of plausible values for a parameter. It depends on a specified confidence level with higher confidence levels corresponding to wider confidence intervals and lower confidence levels corresponding to narrower confidence intervals. Common confidence levels include 90%, 95%, and 99%.

      +

      Usually we don’t just begin sections with a definition, but confidence intervals are simple to define and play an important role in the sciences and any field that uses data. You can think of a confidence interval as playing the role of a net when fishing. Instead of just trying to catch a fish with a single spear (estimating an unknown parameter by using a single point estimate/statistic), we can use a net to try to provide a range of possible locations for the fish (use a range of possible values based around our statistic to make a plausible guess as to the location of the parameter).

      +

      The bootstrapping process will provide bootstrap statistics that have a bootstrap distribution with center at (or extremely close to) the mean of the original sample. This can be seen by giving the observed statistic obs_stat argument the value of the point estimate x_bar.

      +
      bootstrap_distribution %>% visualize(obs_stat = x_bar)
      +

      +

      We can also compute the mean of the bootstrap distribution of means to see how it compares to x_bar:

      +
      bootstrap_distribution %>% 
      +  summarize(mean_of_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_means
      +          <dbl>
      +1          25.1
      +

      In this case, we can see that the bootstrap distribution provides us a guess as to what the variability in different sample means may look like only using the original sample as our guide. We can quantify this variability in the form of a 95% confidence interval in a couple different ways.

      +
      +

      9.3.1 The percentile method

      +

      One way to calculate a range of plausible values for the unknown mean age of coins in 2011 is to use the middle 95% of the bootstrap_distribution to determine our endpoints. Our endpoints are thus at the 2.5th and 97.5th percentiles. This can be done with infer using the get_ci() function. (You can also use the conf_int() or get_confidence_interval() functions here as they are aliases that work the exact same way.)

      +
      bootstrap_distribution %>% 
      +  get_ci(level = 0.95, type = "percentile")
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   21.0    29.3
      +

      These options are the default values for level and type so we can also just do:

      +
      percentile_ci <- bootstrap_distribution %>% 
      +  get_ci()
      +percentile_ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   21.0    29.3
      +

      Using the percentile method, our range of plausible values for the mean age of US pennies in circulation in 2011 is 20.972 years to 29.252 years. We can use the visualize() function to view this using the endpoints and direction arguments, setting direction to "between" (between the values) and endpoints to be those stored with name percentile_ci.

      +
      bootstrap_distribution %>% 
      +  visualize(endpoints = percentile_ci, direction = "between")
      +

      +

      You can see that 95% of the data stored in the stat variable in bootstrap_distribution falls between the two endpoints with 2.5% to the left outside of the shading and 2.5% to the right outside of the shading. The cut-off points that provide our range are shown with the darker lines.

      +
      +
      +

      9.3.2 The standard error method

      +

      If the bootstrap distribution is close to symmetric and bell-shaped, we can also use a shortcut formula for determining the lower and upper endpoints of the confidence interval. This is done by using the formula \(\bar{x} \pm (multiplier * SE),\) where \(\bar{x}\) is our original sample mean and \(SE\) stands for standard error and corresponds to the standard deviation of the bootstrap distribution. The value of \(multiplier\) here is the appropriate percentile of the standard normal distribution.

      +

      These are automatically calculated when level is provided with level = 0.95 being the default. (95% of the values in a standard normal distribution fall within 1.96 standard deviations of the mean, so \(multiplier = 1.96\) for level = 0.95, for example.) As mentioned, this formula assumes that the bootstrap distribution is symmetric and bell-shaped. This is often the case with bootstrap distributions, especially those in which the original distribution of the sample is not highly skewed.

      +

      Definition: standard error

      +

      The standard error is the standard deviation of the sampling distribution.

      +

      The variability of the sampling distribution may be approximated by the variability of the bootstrap distribution. Traditional theory-based methodologies for inference also have formulas for standard errors, assuming some conditions are met.

      +

      This \(\bar{x} \pm (multiplier * SE)\) formula is implemented in the get_ci() function as shown with our pennies problem using the bootstrap distribution’s variability as an approximation for the sampling distribution’s variability. We’ll see more on this approximation shortly.

      +

      Note that the center of the confidence interval (the point_estimate) must be provided for the standard error confidence interval.

      +
      standard_error_ci <- bootstrap_distribution %>% 
      +  get_ci(type = "se", point_estimate = x_bar)
      +standard_error_ci
      +
      # A tibble: 1 x 2
      +  lower upper
      +  <dbl> <dbl>
      +1  21.0  29.3
      +
      bootstrap_distribution %>% 
      +  visualize(endpoints = standard_error_ci, direction = "between")
      +

      +

      We see that both methods produce nearly identical confidence intervals with the percentile method being \([20.97, 29.25]\) and the standard error method being \([20.97, 29.28]\).

      +
      +
      +
      +

      9.4 Comparing bootstrap and sampling distributions

      +

      To help build up the idea of a confidence interval, we weren’t completely honest in our initial discussion. The pennies_sample data frame represents a sample from a larger number of pennies stored as pennies in the moderndive package. The pennies data frame (also in the moderndive package) contains 800 rows of data and two columns pertaining to the same variables as pennies_sample. Let’s begin by understanding some of the properties of the age_by_2011 variable in the pennies data frame.

      +
      ggplot(pennies, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +
      pennies %>% 
      +  summarize(mean_age = mean(age_in_2011),
      +            median_age = median(age_in_2011))
      +
      # A tibble: 1 x 2
      +  mean_age median_age
      +     <dbl>      <dbl>
      +1     21.2         20
      +

      We see that pennies is slightly right-skewed with the mean being pulled towards the upper outliers. Recall that pennies_sample was more symmetric than pennies. In fact, it actually exhibited some left-skew as we compare the mean and median values.

      +
      ggplot(pennies_sample, aes(x = age_in_2011)) +
      +  geom_histogram(bins = 10, color = "white")
      +

      +
      pennies_sample %>% 
      +  summarize(mean_age = mean(age_in_2011),
      +            median_age = median(age_in_2011))
      +
      # A tibble: 1 x 2
      +  mean_age median_age
      +     <dbl>      <dbl>
      +1     25.1       25.5
      +
      +

      Sampling distribution

      +

      Let’s assume that pennies represents our population of interest. We can then create a sampling distribution for the population mean age of pennies, denoted by the Greek letter \(\mu\), using the rep_sample_n() function seen in Chapter 8. First we will create 1000 samples from the pennies data frame.

      +
      thousand_samples <- pennies %>% 
      +  rep_sample_n(size = 40, reps = 1000, replace = FALSE)
      +

      When creating a sampling distribution, we do not replace the items when we create each sample. This is in contrast to the bootstrap distribution. It’s important to remember that the sampling distribution is sampling without replacement from the population to better understand sample-to-sample variability, whereas the bootstrap distribution is sampling with replacement from our original sample to better understand potential sample-to-sample variability.

      +

      After sampling from pennies 1000 times, we next want to compute the mean age for each of the 1000 samples:

      +
      sampling_distribution <- thousand_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(stat = mean(age_in_2011))
      +

      We could use ggplot() with geom_histogram() again, but since we’ve named our column in summarize() to be stat, we can also use the shortcut visualize() function in infer and also specify the number of bins and also fill the bars with a different color such as "salmon". This will be done to help remember that "salmon" corresponds to “sampling distribution”.

      +
      sampling_distribution %>% 
      +  visualize(bins = 10, fill = "salmon")
      +
      +Sampling distribution for n=40 samples of pennies +

      +Figure 9.1: Sampling distribution for n=40 samples of pennies +

      +
      +

      We can also examine the variability in this sampling distribution by calculating the standard deviation of the stat column. Remember that the standard deviation of the sampling distribution is the standard error, frequently denoted as se.

      +
      sampling_distribution %>% 
      +  summarize(se = sd(stat))
      +
      # A tibble: 1 x 1
      +     se
      +  <dbl>
      +1  2.01
      +
      +
      +

      Bootstrap distribution

      +

      Let’s now see how the shape of the bootstrap distribution compares to that of the sampling distribution. We’ll shade the bootstrap distribution blue to further assist with remembering which is which.

      +
      bootstrap_distribution %>% 
      +  visualize(bins = 10, fill = "blue")
      +

      +
      bootstrap_distribution %>% 
      +  summarize(se = sd(stat))
      +
      # A tibble: 1 x 1
      +     se
      +  <dbl>
      +1  2.12
      +

      Notice that while the standard deviations are similar, the center of the sampling distribution and the bootstrap distribution differ:

      +
      sampling_distribution %>% 
      +  summarize(mean_of_sampling_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_sampling_means
      +                   <dbl>
      +1                   21.2
      +
      bootstrap_distribution %>% 
      +  summarize(mean_of_bootstrap_means = mean(stat))
      +
      # A tibble: 1 x 1
      +  mean_of_bootstrap_means
      +                    <dbl>
      +1                    25.1
      +

      Since the bootstrap distribution is centered at the original sample mean, it doesn’t necessarily provide a good estimate of the overall population mean \(\mu\). Let’s calculate the mean of age_in_2011 for the pennies data frame to see how it compares to the mean of the sampling distribution and the mean of the bootstrap distribution.

      +
      pennies %>% 
      +  summarize(overall_mean = mean(age_in_2011))
      +
      # A tibble: 1 x 1
      +  overall_mean
      +         <dbl>
      +1         21.2
      +

      Notice that this value matches up well with the mean of the sampling distribution. This is actually an artifact of the Central Limit Theorem introduced in Chapter 8. The mean of the sampling distribution is expected to be the mean of the overall population.

      +

      The unfortunate fact though is that we don’t know the population mean in nearly all circumstances. The motivation of presenting it here was to show that the theory behind the Central Limit Theorem works using the tools you’ve worked with so far using the ggplot2, dplyr, moderndive, and infer packages.

      +

      If we aren’t able to use the sample mean as a good guess for the population mean, how should we best go about estimating what the population mean may be if we can only select samples from the population. We’ve now come full circle and can discuss the underpinnings of the confidence interval and ways to interpret it.

      +
      +
      +
      +

      9.5 Interpreting the confidence interval

      +

      As shown above in Subsection 9.3.1, one range of plausible values for the population mean age of pennies in 2011, denoted by \(\mu\), is \([20.97, 29.25]\). Recall that this confidence interval is based on bootstrapping using pennies_sample. Note that the mean of pennies (21.152) does fall in this confidence interval. If we had a different sample of size 40 and constructed a confidence interval using the same method, would we be guaranteed that it contained the population parameter value as well? Let’s try it out:

      +
      pennies_sample2 <- pennies %>% 
      +  sample_n(size = 40)
      +

      Note the use of the sample_n() function in the dplyr package here. This does the same thing as rep_sample_n(reps = 1) but omits the extra replicate column.

      +

      We next create an infer pipeline to generate a percentile-based 95% confidence interval for \(\mu\):

      +
      percentile_ci2 <- pennies_sample2 %>% 
      +  specify(formula = age_in_2011 ~ NULL) %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "mean") %>% 
      +  get_ci()
      +percentile_ci2
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   18.4    25.3
      +

      This new confidence interval also contains the value of \(\mu\). Let’s further investigate by repeating this process 100 times to get 100 different confidence intervals derived from 100 different samples of pennies. Each sample will have size of 40 just as the original sample. We will plot each of these confidence intervals as horizontal lines. We will also show a line corresponding to the known population value of 21.152 years.

      +

      +

      Of the 100 confidence intervals based on samples of size \(n = 40\), 96 of them captured the population mean \(\mu = 21.152\), whereas 4 of them did not include it. If we repeated this process of building confidence intervals more times with more samples, we’d expect 95% of them to contain the population mean. In other words, the procedure we have used to generate confidence intervals is “95% reliable” in that we can expect it to include the true population parameter 95% of the time if the process is repeated.

      +

      To further accentuate this point, let’s perform a similar procedure using 90% confidence intervals instead. This time we will use the standard error method instead of the percentile method for computing the confidence intervals.

      +

      +

      Of the 100 confidence intervals based on samples of size \(n = 40\), 87 of them captured the population mean \(\mu = 21.152\), whereas 13 of them did not include it. Repeating this process for more samples would result in us getting closer and closer to 90% of the confidence intervals including the true value. It is common to say while interpreting a confidence interval to be “95% confident” or “90% confident” that the true value falls within the range of the specified confidence interval. We will use this “confident” language throughout the rest of this chapter, but remember that it has more to do with a measure of reliability of the building process.

      +
      +

      Back to our pennies example

      +

      After this elaboration on what the level corresponds to in a confidence interval, let’s conclude by providing an interpretation of the original confidence interval result we found in Subsection 9.3.1.

      +

      Interpretation: We are 95% confident that the true mean age of pennies in circulation in 2011 is between 20.972 and 29.252 years. This level of confidence is based on the percentile-based method including the true mean 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.

      +
      +
      +
      +

      9.6 EXAMPLE: One proportion

      +

      Let’s revisit our exercise of trying to estimate the proportion of red balls in the bowl from Chapter 8. We are now interested in determining a confidence interval for population parameter \(p\), the proportion of balls that are red out of the total \(N = 2400\) red and white balls.

      +

      We will use the first sample reported from Ilyas and Yohan in Subsection 8.2.2 for our point estimate. They observed 21 red balls out of the 50 in their shovel. This data is stored in the tactile_shovel1 data frame in the moderndive package.

      + +
      tactile_shovel1
      +
      # A tibble: 50 x 1
      +   color
      +   <chr>
      + 1 red  
      + 2 red  
      + 3 white
      + 4 red  
      + 5 white
      + 6 red  
      + 7 red  
      + 8 white
      + 9 red  
      +10 white
      +# … with 40 more rows
      +
      +

      9.6.1 Observed Statistic

      +

      To compute the proportion that are red in this data we can use the specify() %>% calculate() workflow. Note the use of the success argument here to clarify which of the two colors "red" or "white" we are interested in.

      +
      p_hat <- tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  calculate(stat = "prop")
      +p_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  0.42
      +
      +
      +

      9.6.2 Bootstrap distribution

      +

      Next we want to calculate many different bootstrap samples and their corresponding bootstrap statistic (the proportion of red balls). We’ve done 1000 in the past, but let’s go up to 10,000 now to better see the resulting distribution. Recall that this is done by including a generate() function call in the middle of our pipeline:

      +
      tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  generate(reps = 10000)
      +

      This results in 50 rows for each of the 10,000 replicates. Lastly, we finish the infer pipeline by adding back in the calculate() step.

      +
      bootstrap_props <- tactile_shovel1 %>% 
      +  specify(formula = color ~ NULL, success = "red") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +

      Let’s visualize() what the resulting bootstrap distribution looks like as a histogram. We’ve adjusted the number of bins here as well to better see the resulting shape.

      +
      bootstrap_props %>% visualize(bins = 25)
      +

      +

      We see that the resulting distribution is symmetric and bell-shaped so it doesn’t much matter which confidence interval method we choose. Let’s use the standard error method to create a 95% confidence interval.

      +
      standard_error_ci <- bootstrap_props %>% 
      +  get_ci(type = "se", level = 0.95, point_estimate = p_hat)
      +standard_error_ci
      +
      # A tibble: 1 x 2
      +  lower upper
      +  <dbl> <dbl>
      +1 0.284 0.556
      +
      bootstrap_props %>% 
      +  visualize(bins = 25, endpoints = standard_error_ci)
      +

      +

      We are 95% confident that the true proportion of red balls in the bowl is between 0.284 and years. This level of confidence is based on the standard error-based method including the true proportion 95% of the time if many different samples (not just the one we used) were collected and confidence intervals were created.

      +
      +
      +

      9.6.3 Theory-based confidence intervals

      +

      When the bootstrap distribution has the nice symmetric, bell shape that we saw in the red balls example above, we can also use a formula to quantify the standard error. This provides another way to compute a confidence interval, but is a little more tedious and mathematical. The steps are outlined below. We’ve also shown how we can use the confidence interval (CI) interpretation in this case as well to support your understanding of this tricky concept.

      +
      +

      Procedure for building a theory-based CI for \(p\)

      +

      To construct a theory-based confidence interval for \(p\), the unknown true population proportion we

      +
        +
      1. Collect a sample of size \(n\)
      2. +
      3. Compute \(\widehat{p}\)
      4. +
      5. Compute the standard error \[\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
      6. +
      7. Compute the margin of error \[\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
      8. +
      9. Compute both end points of the confidence interval: +
          +
        • The lower end point lower_ci: \[\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
        • +
        • The upper end point upper_ci: \[\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} + 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\]
        • +
      10. +
      11. Alternatively, you can succinctly summarize a 95% confidence interval for \(p\) using the \(\pm\) symbol:
      12. +
      +

      \[ +\widehat{p} \pm \text{MoE} = \widehat{p} \pm 1.96 \cdot \text{SE} = \widehat{p} \pm 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}} +\]

      +
      +
      +

      Confidence intervals based on 33 tactile samples

      +

      Let’s load the tactile sampling data for the 33 groups from Chapter 8. Recall this data was saved in the tactile_prop_red data frame included in the moderndive package.

      + +
      tactile_prop_red
      +

      Let’s now apply the above procedure for constructing confidence intervals for \(p\) using the data saved in tactile_prop_red by adding/modifying new columns using the dplyr package data wrangling tools seen in Chapter 5:

      +
        +
      1. Rename prop_red to p_hat, the official name of the sample proportion
      2. +
      3. Make explicit the sample size n of \(n=50\)
      4. +
      5. the standard error SE
      6. +
      7. the margin of error MoE
      8. +
      9. the left endpoint of the confidence interval lower_ci
      10. +
      11. the right endpoint of the confidence interval upper_ci
      12. +
      +
      conf_ints <- tactile_prop_red %>% 
      +  rename(p_hat = prop_red) %>% 
      +  mutate(
      +    n = 50,
      +    SE = sqrt(p_hat * (1 - p_hat) / n),
      +    MoE = 1.96 * SE,
      +    lower_ci = p_hat - MoE,
      +    upper_ci = p_hat + MoE
      +  )
      +conf_ints
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      groupred_ballsp_hatnSEMoElower_ciupper_ci
      Ilyas, Yohan210.42500.0700.1370.2830.557
      Morgan, Terrance170.34500.0670.1310.2090.471
      Martin, Thomas210.42500.0700.1370.2830.557
      Clark, Frank210.42500.0700.1370.2830.557
      Riddhi, Karina180.36500.0680.1330.2270.493
      Andrew, Tyler190.38500.0690.1350.2450.515
      Julia190.38500.0690.1350.2450.515
      Rachel, Lauren110.22500.0590.1150.1050.335
      Daniel, Caroline150.30500.0650.1270.1730.427
      Josh, Maeve170.34500.0670.1310.2090.471
      Emily, Emily160.32500.0660.1290.1910.449
      Conrad, Emily180.36500.0680.1330.2270.493
      Oliver, Erik170.34500.0670.1310.2090.471
      Isabel, Nam210.42500.0700.1370.2830.557
      X, Claire150.30500.0650.1270.1730.427
      Cindy, Kimberly200.40500.0690.1360.2640.536
      Kevin, James110.22500.0590.1150.1050.335
      Nam, Isabelle210.42500.0700.1370.2830.557
      Harry, Yuko150.30500.0650.1270.1730.427
      Yuki, Eileen160.32500.0660.1290.1910.449
      Ramses230.46500.0700.1380.3220.598
      Joshua, Elizabeth, Stanley150.30500.0650.1270.1730.427
      Siobhan, Jane180.36500.0680.1330.2270.493
      Jack, Will160.32500.0660.1290.1910.449
      Caroline, Katie210.42500.0700.1370.2830.557
      Griffin, Y180.36500.0680.1330.2270.493
      Kaitlin, Jordan170.34500.0670.1310.2090.471
      Ella, Garrett180.36500.0680.1330.2270.493
      Julie, Hailin150.30500.0650.1270.1730.427
      Katie, Caroline210.42500.0700.1370.2830.557
      Mallory, Damani, Melissa210.42500.0700.1370.2830.557
      Katie160.32500.0660.1290.1910.449
      Francis, Vignesh190.38500.0690.1350.2450.515
      +

      Let’s plot:

      +
        +
      1. These 33 confidence intervals for \(p\): from lower_ci to upper_ci
      2. +
      3. The true population proportion \(p = 900 / 2400 = 0.375\) with a red vertical line
      4. +
      +
      +33 confidence intervals based on 33 tactile samples of size n=50 +

      +Figure 9.2: 33 confidence intervals based on 33 tactile samples of size n=50 +

      +
      +

      We see that:

      +
        +
      • In 31 cases, the confidence intervals “capture” the true \(p = 900 / 2400 = 0.375\)
      • +
      • In 2 cases, the confidence intervals do not “capture” the true \(p = 900 / 2400 = 0.375\)
      • +
      +

      Thus, the confidence intervals capture the true proportion $31 / 33 = 93.939% of the time using this theory-based methodology.

      +
      +
      +

      Confidence intervals based on 100 virtual samples

      +

      Let’s say however, we repeated the above 100 times, not tactilely, but virtually. Let’s do this only 100 times instead of 1000 like we did before so that the results can fit on the screen. Again, the steps for compute a 95% confidence interval for \(p\) are:

      +
        +
      1. Collect a sample of size \(n = 50\) as we did in Chapter 8
      2. +
      3. Compute \(\widehat{p}\): the sample proportion red of these \(n=50\) balls
      4. +
      5. Compute the standard error \(\text{SE} = \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
      6. +
      7. Compute the margin of error \(\text{MoE} = 1.96 \cdot \text{SE} = 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
      8. +
      9. Compute both end points of the confidence interval: +
          +
        • lower_ci: \(\widehat{p} - \text{MoE} = \widehat{p} - 1.96 \cdot \text{SE} = \widehat{p} - 1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
        • +
        • upper_ci: \(\widehat{p} + \text{MoE} = \widehat{p} + 1.96 \cdot \text{SE} = \widehat{p} +1.96 \cdot \sqrt{\frac{\widehat{p}(1-\widehat{p})}{n}}\)
        • +
      10. +
      +

      Run the following three steps, being sure to View() the resulting data frame after each step so you can convince yourself of what’s going on:

      +
      # First: Take 100 virtual samples of n=50 balls
      +virtual_samples <- bowl %>% 
      +  rep_sample_n(size = 50, reps = 100)
      +
      +# Second: For each virtual sample compute the proportion red
      +virtual_prop_red <- virtual_samples %>% 
      +  group_by(replicate) %>% 
      +  summarize(red = sum(color == "red")) %>% 
      +  mutate(prop_red = red / 50)
      +
      +# Third: Compute the 95% confidence interval as above
      +virtual_prop_red <- virtual_prop_red %>% 
      +  rename(p_hat = prop_red) %>% 
      +  mutate(
      +    n = 50,
      +    SE = sqrt(p_hat*(1-p_hat)/n),
      +    MoE = 1.96 * SE,
      +    lower_ci = p_hat - MoE,
      +    upper_ci = p_hat + MoE
      +  )
      +

      Here are the results:

      +
      +100 confidence intervals based on 100 virtual samples of size n=50 +

      +Figure 9.3: 100 confidence intervals based on 100 virtual samples of size n=50 +

      +
      +

      We see that of our 100 confidence intervals based on samples of size \(n=50\), 96 of them captured the true \(p = 900/2400\), whereas 4 of them missed. As we create more and more confidence intervals based on more and more samples, about 95% of these intervals will capture. In other words our procedure is “95% reliable.”

      +

      Theoretical methods like this have largely been used in the past since we didn’t have the computing power to perform the simulation-based methods such as bootstrapping. They are still commonly used though and if the normality assumptions are met, they can provide a nice option for finding confidence intervals and performing hypothesis tests as we will see in Chapter 10.

      +
      +
      +
      +
      +

      9.7 EXAMPLE: Comparing two proportions

      +

      If you see someone else yawn, are you more likely to yawn? In an episode of the show Mythbusters, they tested the myth that yawning is contagious. The snippet from the show is available to view in the United States on the Discovery Network website here. More information about the episode is also available on IMDb here.

      +

      Fifty adults who thought they were being considered for an appearance on the show were interviewed by a show recruiter (“confederate”) who either yawned or did not. Participants then sat by themselves in a large van and were asked to wait. While in the van, the Mythbusters watched via hidden camera to see if the unaware participants yawned. The data frame containing the results is available at mythbusters_yawn in the moderndive package. Let’s check it out.

      +
      mythbusters_yawn
      +
      # A tibble: 50 x 3
      +    subj group   yawn 
      +   <int> <chr>   <chr>
      + 1     1 seed    yes  
      + 2     2 control yes  
      + 3     3 seed    no   
      + 4     4 seed    yes  
      + 5     5 seed    no   
      + 6     6 control no   
      + 7     7 seed    yes  
      + 8     8 control no   
      + 9     9 control no   
      +10    10 seed    no   
      +# … with 40 more rows
      +
        +
      • The participant ID is stored in the subj variable with values of 1 to 50.
      • +
      • The group variable is either "seed" for when a confederate was trying to influence the participant or "control" if a confederate did not interact with the participant.
      • +
      • The yawn variable is either "yes" if the participant yawned or "no" if the participant did not yawn.
      • +
      +

      We can use the janitor package to get a glimpse into this data in a table format:

      +
      mythbusters_yawn %>% 
      +  tabyl(group, yawn) %>% 
      +  adorn_percentages() %>% 
      +  adorn_pct_formatting() %>% 
      +  # To show original counts
      +  adorn_ns()
      +
         group         no        yes
      + control 75.0% (12) 25.0%  (4)
      +    seed 70.6% (24) 29.4% (10)
      +

      We are interested in comparing the proportion of those that yawned after seeing a seed versus those that yawned with no seed interaction. We’d like to see if the difference between these two proportions is significantly larger than 0. If so, we’d have evidence to support the claim that yawning is contagious based on this study.

      +

      In looking over this problem, we can make note of some important details to include in our infer pipeline:

      +
        +
      • We are calling a success having a yawn value of "yes".
      • +
      • Our response variable will always correspond to the variable used in the success so the response variable is yawn.
      • +
      • The explanatory variable is the other variable of interest here: group.
      • +
      +

      To summarize, we are looking to see the examine the relationship between yawning and whether or not the participant saw a seed yawn or not.

      +
      +

      9.7.1 Compute the point estimate

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group)
      +
      Error: A level of the response variable `yawn` needs to be specified for the `success` argument in `specify()`.
      +

      Note that the success argument must be specified in situations such as this where the response variable has only two levels.

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes")
      +
      Response: yawn (factor)
      +Explanatory: group (factor)
      +# A tibble: 50 x 2
      +   yawn  group  
      +   <fct> <fct>  
      + 1 yes   seed   
      + 2 yes   control
      + 3 no    seed   
      + 4 yes   seed   
      + 5 no    seed   
      + 6 no    control
      + 7 yes   seed   
      + 8 no    control
      + 9 no    control
      +10 no    seed   
      +# … with 40 more rows
      +

      We next want to calculate the statistic of interest for our sample. This corresponds to the difference in the proportion of successes.

      +
      mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  calculate(stat = "diff in props")
      +
      Error: Statistic is based on a difference; specify the `order` in which to subtract the levels of the explanatory variable. `order = c("first", "second")` means `("first" - "second")`. Check `?calculate` for details.
      +

      We see another error here. To further check to make sure that R knows exactly what we are after, we need to provide the order in which R should subtract these proportions of successes. As the error message states, we’ll want to put "seed" first after c() and then "control": order = c("seed", "control"). Our point estimate is thus calculated:

      +
      obs_diff <- mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  calculate(stat = "diff in props", order = c("seed", "control"))
      +obs_diff
      +
      # A tibble: 1 x 1
      +    stat
      +   <dbl>
      +1 0.0441
      +

      This value represents the proportion of those that yawned after seeing a seed yawn (0.2941) minus the proportion of those that yawned with not seeing a seed (0.25).

      +
      +
      +

      9.7.2 Bootstrap distribution

      +

      Our next step in building a confidence interval is to create a bootstrap distribution of statistics (differences in proportions of successes). We saw how it works with both a single variable in computing bootstrap means in Subsection 9.1.3 and in computing bootstrap proportions in Section 9.6, but we haven’t yet worked with bootstrapping involving multiple variables though.

      +

      In the infer package, bootstrapping with multiple variables means that each row is potentially resampled. Let’s investigate this by looking at the first few rows of mythbusters_yawn:

      +
      head(mythbusters_yawn)
      +
      # A tibble: 6 x 3
      +   subj group   yawn 
      +  <int> <chr>   <chr>
      +1     1 seed    yes  
      +2     2 control yes  
      +3     3 seed    no   
      +4     4 seed    yes  
      +5     5 seed    no   
      +6     6 control no   
      +

      When we bootstrap this data, we are potentially pulling the subject’s readings multiple times. Thus, we could see the entries of "seed" for group and "no" for yawn together in a new row in a bootstrap sample. This is further seen by exploring the sample_n() function in dplyr on this smaller 6 row data frame comprised of head(mythbusters_yawn). The sample_n() function can perform this bootstrapping procedure and is similar to the rep_sample_n() function in infer, except that it is not repeated but rather only performs one sample with or without replacement.

      +
      set.seed(2019)
      +
      head(mythbusters_yawn) %>% 
      +  sample_n(size = 6, replace = TRUE)
      +
      # A tibble: 6 x 3
      +   subj group   yawn 
      +  <int> <chr>   <chr>
      +1     5 seed    no   
      +2     5 seed    no   
      +3     2 control yes  
      +4     4 seed    yes  
      +5     1 seed    yes  
      +6     1 seed    yes  
      +

      We can see that in this bootstrap sample generated from the first six rows of mythbusters_yawn, we have some rows repeated. The same is true when we perform the generate() step in infer as done below.

      +
      bootstrap_distribution <- mythbusters_yawn %>% 
      +  specify(formula = yawn ~ group, success = "yes") %>% 
      +  generate(reps = 1000) %>% 
      +  calculate(stat = "diff in props", order = c("seed", "control"))
      +
      bootstrap_distribution %>% visualize(bins = 20)
      +

      +

      This distribution is roughly symmetric and bell-shaped but isn’t quite there. Let’s use the percentile-based method to compute a 95% confidence interval for the true difference in the proportion of those that yawn with and without a seed presented. The arguments are explicitly listed here but remember they are the defaults and simply get_ci() can be used.

      +
      bootstrap_distribution %>% 
      +  get_ci(type = "percentile", level = 0.95)
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.219   0.293
      +

      The confidence interval shown here includes the value of 0. We’ll see in Chapter 10 further what this means in terms of this difference being statistically significant or not, but let’s examine a bit here first. The range of plausible values for the difference in the proportion of that that yawned with and without a seed is between -0.219 and 0.293.

      +

      Therefore, we are not sure which proportion is larger. Some of the bootstrap statistics showed the proportion without a seed to be higher and others showed the proportion with a seed to be higher. If the confidence interval was entirely above zero, we would be relatively sure (about “95% confident”) that the seed group had a higher proportion of yawning than the control group.

      +

      Note that this all relates to the importance of denoting the order argument in the calculate() function. Since we specified "seed" and then "control" positive values for the statistic correspond to the "seed" proportion being higher, whereas negative values correspond to the "control" group being higher.

      +

      We, therefore, have evidence via this confidence interval suggesting that the conclusion from the Mythbusters show that “yawning is contagious” being “confirmed” is not statistically appropriate.

      +
      +

      +Learning check +

      +
      +

      Practice problems to come soon!

      +
      + +
      +
      +
      +
      +

      9.8 Conclusion

      +
      +

      9.8.1 What’s to come?

      +

      This chapter introduced the notions of bootstrapping and confidence intervals as ways to build intuition about population parameters using only the original sample information. We also concluded with a glimpse into statistical significance and we’ll dig much further into this in Chapter 10 up next!

      +
      +
      +

      9.8.2 Script of R code

      +

      An R script file of all R code used in this chapter is available here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/A-appendixA.html b/previous_versions/v0.4.0/A-appendixA.html new file mode 100644 index 000000000..38f919b39 --- /dev/null +++ b/previous_versions/v0.4.0/A-appendixA.html @@ -0,0 +1,657 @@ + + + + + + + + A Statistical Background | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      A Statistical Background

      +
      +

      A.1 Basic statistical terms

      +
      +

      A.1.1 Mean

      +

      The mean is the most commonly reported measure of center. It is commonly called the “average” though this term can be a little ambiguous. The mean is the sum of all of the data elements divided by how many elements there are. If we have \(n\) data points, the mean is given by: \[Mean = \frac{x_1 + x_2 + \cdots + x_n}{n}\]

      +
      +
      +

      A.1.2 Median

      +

      The median is calculated by first sorting a variable’s data from smallest to largest. After sorting the data, the middle element in the list is the median. If the middle falls between two values, then the median is the mean of those two values.

      +
      +
      +

      A.1.3 Standard deviation

      +

      We will next discuss the standard deviation of a sample dataset pertaining to one variable. The formula can be a little intimidating at first but it is important to remember that it is essentially a measure of how far to expect a given data value is from its mean:

      +

      \[Standard \, deviation = \sqrt{\frac{(x_1 - Mean)^2 + (x_2 - Mean)^2 + \cdots + (x_n - Mean)^2}{n - 1}}\]

      +
      +
      +

      A.1.4 Five-number summary

      +

      The five-number summary consists of five values: minimum, first quantile (25th percentile), median (50th percentile), third quantile (75th) quantile, and maximum. The quantiles are calculated as

      +
        +
      • first quantile (\(Q_1\)): the median of the first half of the sorted data
      • +
      • third quantile (\(Q_3\)): the median of the second half of the sorted data
      • +
      +

      The interquartile range is defined as \(Q_3 - Q_1\) and is a measure of how spread out the middle 50% of values is. The five-number summary is not influenced by the presence of outliers in the ways that the mean and standard deviation are. It is, thus, recommended for skewed datasets.

      +
      +
      +

      A.1.5 Distribution

      +

      The distribution of a variable/dataset corresponds to generalizing patterns in the dataset. It often shows how frequently elements in the dataset appear. It shows how the data varies and gives some information about where a typical element in the data might fall. Distributions are most easily seen through data visualization.

      +
      +
      +

      A.1.6 Outliers

      +

      Outliers correspond to values in the dataset that fall far outside the range of “ordinary” values. In regards to a boxplot (by default), they correspond to values below \(Q_1 - (1.5 * IQR)\) or above \(Q_3 + (1.5 * IQR)\).

      +

      Note that these terms (aside from Distribution) only apply to quantitative variables.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/B-appendixB.html b/previous_versions/v0.4.0/B-appendixB.html new file mode 100644 index 000000000..ea1e3de54 --- /dev/null +++ b/previous_versions/v0.4.0/B-appendixB.html @@ -0,0 +1,1618 @@ + + + + + + + + B Inference Examples | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      B Inference Examples

      +

      This appendix is designed to provide you with examples of the five basic hypothesis tests and their corresponding confidence intervals. Traditional theory-based methods as well as computational-based methods are presented.

      +
      +

      +Note: This appendix is still under construction. If you would like to contribute, please check us out on GitHub at https://github.com/moderndive/moderndive_book. +

      +

      +Please check out our sneak peak of infer below in the meanwhile. For more details on infer visit https://infer.netlify.com/. +

      +
      +Drawing +
      +
      +
      +

      Needed packages

      +
      library(dplyr)
      +library(ggplot2)
      +library(infer)
      +library(knitr)
      +library(readr)
      +library(janitor)
      +
      +
      +

      B.1 Inference mind map

      +

      To help you better navigate and choose the appropriate analysis, we’ve created a mind map on http://coggle.it available here and below.

      +
      +Mind map for Inference +

      +Figure B.1: Mind map for Inference +

      +
      +
      +
      +

      B.2 One mean

      +
      +

      B.2.1 Problem statement

      +

      The National Survey of Family Growth conducted by the +Centers for Disease Control gathers information on family life, marriage and divorce, pregnancy, +infertility, use of contraception, and men’s and women’s health. One of the variables collected on +this survey is the age at first marriage. 5,534 randomly sampled US women between 2006 and 2010 completed the survey. The women sampled here had been married at least once. Do we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years? (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 4])

      +
      +
      +

      B.2.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is equal to 23 years.

      • +
      • Alternative hypothesis: The mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu = \mu_{0}\), where \(\mu\) represents the mean age of first marriage for all US women from 2006 to 2010 and \(\mu_0\) is 23.
      • +
      • \(H_A: \mu > 23\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.2.3 Exploring the sample data

      +
      age_at_marriage <- read_csv("https://moderndive.com/data/ageAtMar.csv")
      +
      age_summ <- age_at_marriage %>%
      +  summarize(sample_size = n(),
      +    mean = mean(age),
      +    sd = sd(age),
      +    minimum = min(age),
      +    lower_quartile = quantile(age, 0.25),
      +    median = median(age),
      +    upper_quartile = quantile(age, 0.75),
      +    max = max(age))
      +kable(age_summ)
      + + + + + + + + + + + + + + + + + + + + + + + + + +
      sample_sizemeansdminimumlower_quartilemedianupper_quartilemax
      553423.44.721020232643
      +

      The histogram below also shows the distribution of age.

      +
      ggplot(data = age_at_marriage, mapping = aes(x = age)) +
      +  geom_histogram(binwidth = 3, color = "white")
      +

      +

      The observed statistic of interest here is the sample mean:

      +
      x_bar <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  calculate(stat = "mean")
      +x_bar
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  23.4
      +
      +

      Guess about statistical significance

      +

      We are looking to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\). They seem to be quite close, but we have a large sample size here. Let’s guess that the large sample size will lead us to reject this practically small difference.

      +
      +
      +
      +
      +

      B.2.4 Non-traditional methods

      +
      +

      Bootstrapping for hypothesis test

      +

      In order to look to see if the observed sample mean of 23.44 is statistically greater than \(\mu_0 = 23\), we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 5534 was selected.

      +

      We can use the idea of bootstrapping to simulate the population from which the sample came and then generate samples from that simulated population to account for sampling variability. Recall how bootstrapping would apply in this context:

      +
        +
      1. Sample with replacement from our original sample of 5534 women and repeat this process 10,000 times,
      2. +
      3. calculate the mean for each of the 10,000 bootstrap samples created in Step 1.,
      4. +
      5. combine all of these bootstrap statistics calculated in Step 2 into a boot_distn object, and
      6. +
      7. shift the center of this distribution over to the null value of 23. (This is needed since it will be centered at 23.44 via the process of bootstrapping.)
      8. +
      +
      set.seed(2018)
      +null_distn_one_mean <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  hypothesize(null = "point", mu = 23) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      null_distn_one_mean %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a right-tailed test so we will be looking for values that are greater than or equal to 23.44 for our \(p\)-value.

      +
      null_distn_one_mean %>%
      +  visualize(obs_stat = x_bar, direction = "greater")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_one_mean %>%
      +  get_pvalue(obs_stat = x_bar, direction = "greater")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1       0
      +

      So our \(p\)-value is 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu\) using our sample data using bootstrapping. Note that we don’t need to shift this distribution since we want the center of our confidence interval to be our point estimate \(\bar{x}_{obs} = 23.44\).

      +
      boot_distn_one_mean <- age_at_marriage %>% 
      +  specify(response = age) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      ci <- boot_distn_one_mean %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   23.3    23.6
      +
      boot_distn_one_mean %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 23 is not contained in this confidence interval as a plausible value of \(\mu\) (the unknown population mean) and the entire interval is larger than 23. This matches with our hypothesis test results of rejecting the null hypothesis in favor of the alternative (\(\mu > 23\)).

      +

      Interpretation: We are 95% confident the true mean age of first marriage for all US women from 2006 to 2010 is between 23.316 and 23.565.

      +
      +
      +
      +
      +

      B.2.5 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are collected independently.

        +

        The cases are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The distribution of the response variable should be normal or the sample size should be at least 30.

        +

        The histogram for the sample above does show some skew.

      4. +
      +

      The Q-Q plot below also shows some skew.

      +
      ggplot(data = age_at_marriage, mapping = aes(sample = age)) +
      +  stat_qq()
      +

      +

      The sample size here is quite large though (\(n = 5534\)) so both conditions are met.

      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean \(\mu\). A good guess is the sample mean \(\bar{X}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{obs} = 23.44\) or larger assuming that the population mean is 23 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):

      +

      \[ T =\dfrac{ \bar{X} - \mu_0}{ S / \sqrt{n} } \sim t (df = n - 1) \]

      +

      where \(S\) represents the standard deviation of the sample and \(n\) is the sample size.

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test() function to perform this analysis for us.

      +
      t_test_results <- age_at_marriage %>% 
      +  infer::t_test(formula = age ~ NULL,
      +       alternative = "greater",
      +       mu = 23)
      +t_test_results
      +
      # A tibble: 1 x 6
      +  statistic  t_df  p_value alternative lower_ci upper_ci
      +      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
      +1      6.94  5533 2.25e-12 greater         23.3      Inf
      +

      We see here that the \(t_{obs}\) value is 6.936.

      +
      +
      +
      +

      Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing an \(t_{obs}\) value of 6.936 or more in our null distribution of a \(t\) with 5533 degrees of freedom—is essentially 0.

      +
      +
      +

      State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean was statistically greater than the hypothesized mean has supporting evidence here. Based on this sample, we have evidence that the mean age of first marriage for all US women from 2006 to 2010 is greater than 23 years.

      +
      +
      +

      Confidence interval

      +
      t.test(x = age_at_marriage$age, 
      +       alternative = "two.sided",
      +       mu = 23)$conf
      +
      [1] 23.3 23.6
      +attr(,"conf.level")
      +[1] 0.95
      +
      +
      +
      +
      +

      B.2.6 Comparing results

      +

      Observing the bootstrap distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met (the large sample size was the driver here) leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.3 One proportion

      +
      +

      B.3.1 Problem statement

      +

      The CEO of a large electric utility claims that 80 percent of his 1,000,000 customers are satisfied with the service they receive. To test this claim, the local newspaper surveyed 100 customers, using simple random sampling. 73 were satisfied and the remaining were unsatisfied. Based on these findings from the sample, can we reject the CEO’s hypothesis that 80% of the customers are satisfied? [Tweaked a bit from http://stattrek.com/hypothesis-test/proportion.aspx?Tutorial=AP]

      +
      +
      +

      B.3.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is equal 0.80.

      • +
      • Alternative hypothesis: The proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \pi = p_{0}\), where \(\pi\) represents the proportion of all customers of the large electric utility satisfied with service they receive and \(p_0\) is 0.8.
      • +
      • \(H_A: \pi \ne 0.8\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.3.3 Exploring the sample data

      +
      elec <- c(rep("satisfied", 73), rep("unsatisfied", 27)) %>% 
      +  as_data_frame() %>% 
      +  rename(satisfy = value)
      +

      The bar graph below also shows the distribution of satisfy.

      +
      ggplot(data = elec, aes(x = satisfy)) + 
      +  geom_bar()
      +

      +

      The observed statistic is computed as

      +
      p_hat <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  calculate(stat = "prop")
      +p_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1  0.73
      +
      +

      Guess about statistical significance

      +

      We are looking to see if the sample proportion of 0.73 is statistically different from \(p_0 = 0.8\) based on this sample. They seem to be quite close, and our sample size is not huge here (\(n = 100\)). Let’s guess that we do not have evidence to reject the null hypothesis.

      +
      +
      +
      +
      +

      B.3.4 Non-traditional methods

      +
      +

      Simulation for hypothesis test

      +

      In order to look to see if 0.73 is statistically different from 0.8, we need to account for the sample size. We also need to determine a process that replicates how the original sample of size 100 was selected. We can use the idea of an unfair coin to simulate this process. We will simulate flipping an unfair coin (with probability of success 0.8 matching the null hypothesis) 100 times. Then we will keep track of how many heads come up in those 100 flips. Our simulated statistic matches with how we calculated the original statistic \(\hat{p}\): the number of heads (satisfied) out of our total sample of 100. We then repeat this process many times (say 10,000) to create the null distribution looking at the simulated proportions of successes:

      +
      set.seed(2018)
      +null_distn_one_prop <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  hypothesize(null = "point", p = 0.8) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +
      null_distn_one_prop %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are 0.8 - 0.73 = 0.07 away from 0.8 in BOTH directions for our \(p\)-value:

      +
      null_distn_one_prop %>% 
      +  visualize(obs_stat = p_hat, direction = "both")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_one_prop %>% 
      +  get_pvalue(obs_stat = p_hat, direction = "both")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1  0.0813
      +

      So our \(p\)-value is 0.081 and we fail to reject the null hypothesis at the 5% level.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\pi\) using our sample data. To do so, we use bootstrapping, which involves

      +
        +
      1. sampling with replacement from our original sample of 100 survey respondents and repeating this process 10,000 times,
      2. +
      3. calculating the proportion of successes for each of the 10,000 bootstrap samples created in Step 1.,
      4. +
      5. combining all of these bootstrap statistics calculated in Step 2 into a boot_distn object,
      6. +
      7. identifying the 2.5th and 97.5th percentiles of this distribution (corresponding to the 5% significance level chosen) to find a 95% confidence interval for \(\pi\), and
      8. +
      9. interpret this confidence interval in the context of the problem.
      10. +
      +
      boot_distn_one_prop <- elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "prop")
      +

      Just as we use the mean function for calculating the mean over a numerical variable, we can also use it to compute the proportion of successes for a categorical variable where we specify what we are calling a “success” after the ==. (Think about the formula for calculating a mean and how R handles logical statements such as satisfy == "satisfied" for why this must be true.)

      +
      ci <- boot_distn_one_prop %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1   0.64    0.81
      +
      boot_distn_one_prop %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0.80 is contained in this confidence interval as a plausible value of \(\pi\) (the unknown population proportion). This matches with our hypothesis test results of failing to reject the null hypothesis.

      +

      Interpretation: We are 95% confident the true proportion of customers who are satisfied with the service they receive is between 0.64 and 0.81.

      +
      +
      +
      +
      +

      B.3.5 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are collected independently.

        +

        The cases are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The number of expected successes and expected failures is at least 10.

        +

        This condition is met since 73 and 27 are both greater than 10.

      4. +
      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population proportion \(\pi\). A good guess is the sample proportion \(\hat{P}\). Recall that this sample proportion is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample proportion of \(\hat{p}_{obs} = 0.73\) or larger assuming that the population proportion is 0.80 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can standardize this original test statistic of \(\hat{P}\) into a \(Z\) statistic that follows a \(N(0, 1)\) distribution.

      +

      \[ Z =\dfrac{ \hat{P} - p_0}{\sqrt{\dfrac{p_0(1 - p_0)}{n} }} \sim N(0, 1) \]

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand” by plugging the observed values into the formula, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. The calculation has been done in R below for completeness though:

      +
      p_hat <- 0.73
      +p0 <- 0.8
      +n <- 100
      +(z_obs <- (p_hat - p0) / sqrt( (p0 * (1 - p0)) / n))
      +
      [1] -1.75
      +

      We see here that the \(z_{obs}\) value is around -1.75. Our observed sample proportion of 0.73 is 1.75 standard errors below the hypothesized parameter value of 0.8.

      +
      +
      +
      +

      Visualize and compute \(p\)-value

      +
      elec %>% 
      +  specify(response = satisfy, success = "satisfied") %>% 
      +  hypothesize(null = "point", p = 0.8) %>% 
      +  calculate(stat = "z") %>% 
      +  visualize(method = "theoretical", obs_stat = z_obs, direction = "both")
      +

      +
      2 * pnorm(z_obs)
      +
      [1] 0.0801
      +

      The \(p\)-value—the probability of observing an \(z_{obs}\) value of -1.75 or more extreme (in both directions) in our null distribution—is around 8%.

      +

      Note that we could also do this test directly using the prop.test function.

      +
      stats::prop.test(x = table(elec$satisfy),
      +       n = length(elec$satisfy),
      +       alternative = "two.sided",
      +       p = 0.8,
      +       correct = FALSE)
      +
      
      +    1-sample proportions test without continuity correction
      +
      +data:  table(elec$satisfy), null probability 0.8
      +X-squared = 3, df = 1, p-value = 0.08
      +alternative hypothesis: true p is not equal to 0.8
      +95 percent confidence interval:
      + 0.636 0.807
      +sample estimates:
      +   p 
      +0.73 
      +

      prop.test does a \(\chi^2\) test here but this matches up exactly with what we would expect: \(x^2_{obs} = 3.06 = (-1.75)^2 = (z_{obs})^2\) and the \(p\)-values are the same because we are focusing on a two-tailed test.

      +

      Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.

      +
      +
      +

      State conclusion

      +

      We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample proportion was not statistically greater than the hypothesized proportion has not been invalidated. Based on this sample, we have do not evidence that the proportion of all customers of the large electric utility satisfied with service they receive is different from 0.80, at the 5% level.

      +
      +
      +
      +
      +

      B.3.6 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.4 Two proportions

      +
      +

      B.4.1 Problem statement

      +

      A 2010 survey asked 827 randomly sampled registered voters +in California “Do you support? Or do you oppose? Drilling for oil and natural gas off the Coast of +California? Or do you not know enough to say?” Conduct a hypothesis test to determine if the data +provide strong evidence that the proportion of college +graduates who do not have an opinion on this issue is +different than that of non-college graduates. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 6])

      +
      +
      +

      B.4.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: There is no association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

      • +
      • Alternative hypothesis: There is an association between having an opinion on drilling and having a college degree for all registered California voters in 2010.

      • +
      +
      +
      +

      Another way in words

      +
        +
      • Null hypothesis: The probability that a Californian voter in 2010 having no opinion on drilling and is a college graduate is the same as that of a non-college graduate.

      • +
      • Alternative hypothesis: These parameter probabilities are different.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \pi_{college} = \pi_{no\_college}\) or \(H_0: \pi_{college} - \pi_{no\_college} = 0\), where \(\pi\) represents the probability of not having an opinion on drilling.
      • +
      • \(H_A: \pi_{college} - \pi_{no\_college} \ne 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.4.3 Exploring the sample data

      +
      offshore <- read_csv("https://moderndive.com/data/offshore.csv")
      +
      offshore %>% tabyl(college_grad, response)
      +
       college_grad no opinion opinion
      +           no        131     258
      +          yes        104     334
      +
      off_summ <- offshore %>% 
      +  group_by(college_grad) %>% 
      +  summarize(prop_no_opinion = mean(response == "no opinion"),
      +    sample_size = n())
      +
      ggplot(offshore, aes(x = college_grad, fill = response)) +
      +  geom_bar(position = "fill") +
      +  coord_flip()
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if a difference exists in the size of the bars corresponding to no opinion for the plot. Based solely on the plot, we have little reason to believe that a difference exists since the bars seem to be about the same size, BUT…it’s important to use statistics to see if that difference is actually statistically significant!

      +
      +
      +
      +
      +

      B.4.4 Non-traditional methods

      +
      +

      Collecting summary info

      +

      The observed statistic is

      +
      d_hat <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +d_hat
      +
      # A tibble: 1 x 1
      +     stat
      +    <dbl>
      +1 -0.0993
      +
      +
      +

      Randomization for hypothesis test

      +

      In order to look to see if the observed sample proportion of no opinion for college graduates of 0.337 is statistically different than that for graduates of 0.237, we need to account for the sample sizes. Note that this is the same as looking to see if \(\hat{p}_{grad} - \hat{p}_{nograd}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 389 and 438 were selected.

      +

      We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability.

      +
      set.seed(2018)
      +null_distn_two_props <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>%
      +  hypothesize(null = "independence") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +
      null_distn_two_props %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to -0.099 or less than or equal to 0.099 for our \(p\)-value.

      +
      null_distn_two_props %>% 
      +  visualize(obs_stat = d_hat, direction = "two_sided")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_two_props %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "two_sided")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1  0.0021
      +

      So our \(p\)-value is 0.002 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the tails of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\pi_{college} - \pi_{no\_college}\) using our sample data with bootstrapping.

      +
      boot_distn_two_props <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>%
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in props", order = c("yes", "no"))
      +
      ci <- boot_distn_two_props %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.161 -0.0378
      +
      boot_distn_two_props %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is not contained in this confidence interval as a plausible value of \(\pi_{college} - \pi_{no\_college}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter, we have evidence that the proportion of college graduates in California with no opinion on drilling is different than that of non-college graduates.

      +

      Interpretation: We are 95% confident the true proportion of non-college graduates with no opinion on offshore drilling in California is between 0.16 dollars smaller to 0.04 dollars smaller than for college graduates.

      +
      +
      +
      +
      +

      B.4.5 Traditional methods

      +
      +
      +

      B.4.6 Check conditions

      +

      Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: Each case that was selected must be independent of all the other cases selected.

        +

        This condition is met since cases were selected at random to observe.

      2. +
      3. Sample size: The number of pooled successes and pooled failures must be at least 10 for each group.

        +

        We need to first figure out the pooled success rate: \[\hat{p}_{obs} = \dfrac{131 + 104}{827} = 0.28.\] We now determine expected (pooled) success and failure counts:

        +

        \(0.28 \cdot (131 + 258) = 108.92\), \(0.72 \cdot (131 + 258) = 280.08\)

        +

        \(0.28 \cdot (104 + 334) = 122.64\), \(0.72 \cdot (104 + 334) = 315.36\)

      4. +
      5. Independent selection of samples: The cases are not paired in any meaningful way.

        +

        We have no reason to suspect that a college graduate selected would have any relationship to a non-college graduate selected.

      6. +
      +
      +
      +

      B.4.7 Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample proportions corresponding to no opinion on drilling (\(\hat{p}_{college, obs} - \hat{p}_{no\_college, obs}\) = 0.033) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the standard normal distribution to standardize the difference in sample proportions (\(\hat{P}_{college} - \hat{P}_{no\_college}\)) using the standard error of \(\hat{P}_{college} - \hat{P}_{no\_college}\) and the pooled estimate:

      +

      \[ Z =\dfrac{ (\hat{P}_1 - \hat{P}_2) - 0}{\sqrt{\dfrac{\hat{P}(1 - \hat{P})}{n_1} + \dfrac{\hat{P}(1 - \hat{P})}{n_2} }} \sim N(0, 1) \] where \(\hat{P} = \dfrac{\text{total number of successes} }{ \text{total number of cases}}.\)

      +
      +

      Observed test statistic

      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the prop.test function to perform this analysis for us.

      +
      z_hat <- offshore %>% 
      +  specify(response ~ college_grad, success = "no opinion") %>% 
      +  calculate(stat = "z", order = c("yes", "no"))
      +z_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 -3.16
      +

      The observed difference in sample proportions is 3.16 standard deviations smaller than 0.

      +

      The \(p\)-value—the probability of observing a \(Z\) value of -3.16 or more extreme in our null distribution—is 0.0016. This can also be calculated in R directly:

      +
      2 * pnorm(-3.16, lower.tail = TRUE)
      +
      [1] 0.00158
      +
      +
      +
      +

      B.4.8 State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference did not exist in the proportions of no opinion on offshore drilling between college educated and non-college educated Californians was not validated. We do have evidence to suggest that there is a dependency between college graduation and position on offshore drilling for Californians.

      +
      +
      +
      +

      B.4.9 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results.

      +
      +
      +
      +
      +
      +

      B.5 Two means (independent samples)

      +
      +

      B.5.1 Problem statement

      +

      Average income varies from one region of the country to +another, and it often reflects both lifestyles and regional living expenses. Suppose a new graduate +is considering a job in two locations, Cleveland, OH and Sacramento, CA, and he wants to see +whether the average income in one of these cities is higher than the other. He would like to conduct +a hypothesis test based on two randomly selected samples from the 2000 Census. (Tweaked a bit from Diez, Barr, and Çetinkaya-Rundel 2014 [Chapter 5])

      +
      +
      +

      B.5.2 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: There is no association between income and location (Cleveland, OH and Sacramento, CA).

      • +
      • Alternative hypothesis: There is an association between income and location (Cleveland, OH and Sacramento, CA).

      • +
      +
      +
      +

      Another way in words

      +
        +
      • Null hypothesis: The mean income is the same for both cities.

      • +
      • Alternative hypothesis: The mean income is different for the two cities.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu_{sac} = \mu_{cle}\) or \(H_0: \mu_{sac} - \mu_{cle} = 0\), where \(\mu\) represents the average income.
      • +
      • \(H_A: \mu_{sac} - \mu_{cle} \ne 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.5.3 Exploring the sample data

      +
      cle_sac <- read.delim("https://moderndive.com/data/cleSac.txt") %>%
      +  rename(metro_area = Metropolitan_area_Detailed,
      +         income = Total_personal_income) %>%
      +  na.omit()
      +
      inc_summ <- cle_sac %>% group_by(metro_area) %>%
      +  summarize(sample_size = n(),
      +    mean = mean(income),
      +    sd = sd(income),
      +    minimum = min(income),
      +    lower_quartile = quantile(income, 0.25),
      +    median = median(income),
      +    upper_quartile = quantile(income, 0.75),
      +    max = max(income))
      +kable(inc_summ)
      + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      metro_areasample_sizemeansdminimumlower_quartilemedianupper_quartilemax
      Cleveland_ OH2122746727681084752100035275152400
      Sacramento_ CA1753242835774080502000049350206900
      +

      The boxplot below also shows the mean for each group highlighted by the red dots.

      +
      ggplot(cle_sac, aes(x = metro_area, y = income)) +
      +  geom_boxplot() +
      +  stat_summary(fun.y = "mean", geom = "point", color = "red")
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if a difference exists in the mean income of the two levels of the explanatory variable. Based solely on the boxplot, we have reason to believe that no difference exists. The distributions of income seem similar and the means fall in roughly the same place.

      +
      +
      +
      +
      +

      B.5.4 Non-traditional methods

      +
      +

      Collecting summary info

      +

      We now compute the observed statistic:

      +
      d_hat <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  calculate(stat = "diff in means", 
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +d_hat
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 4960.
      +
      +
      +

      Randomization for hypothesis test

      +

      In order to look to see if the observed sample mean for Sacramento of 27467.066 is statistically different than that for Cleveland of 32427.543, we need to account for the sample sizes. Note that this is the same as looking to see if \(\bar{x}_{sac} - \bar{x}_{cle}\) is statistically different than 0. We also need to determine a process that replicates how the original group sizes of 212 and 175 were selected.

      +

      We can use the idea of randomization testing (also known as permutation testing) to simulate the population from which the sample came (with two groups of different sizes) and then generate samples using shuffling from that simulated population to account for sampling variability.

      +
      set.seed(2018)
      +null_distn_two_means <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  hypothesize(null = "independence") %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in means",
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +
      null_distn_two_means %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a two-tailed test so we will be looking for values that are greater than or equal to 4960.477 or less than or equal to -4960.477 for our \(p\)-value.

      +
      null_distn_two_means %>% 
      +  visualize(obs_stat = d_hat, direction = "both")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_two_means %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "both")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1   0.124
      +

      So our \(p\)-value is 0.124 and we fail to reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are not very far into the tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu_{sac} - \mu_{cle}\) using our sample data with bootstrapping. Here we will bootstrap each of the groups with replacement instead of shuffling. This is done using the groups +argument in the resample function to fix the size of each group to +be the same as the original group sizes of 175 for Sacramento and 212 for Cleveland.

      +
      boot_distn_two_means <- cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "diff in means",
      +            order = c("Sacramento_ CA", "Cleveland_ OH"))
      +
      ci <- boot_distn_two_means %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -1446.  11308.
      +
      boot_distn_two_means %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is contained in this confidence interval as a plausible value of \(\mu_{sac} - \mu_{cle}\) (the unknown population parameter). This matches with our hypothesis test results of failing to reject the null hypothesis. Since zero is a plausible value of the population parameter, we do not have evidence that Sacramento incomes are different than Cleveland incomes.

      +

      Interpretation: We are 95% confident the true mean yearly income for those living in Sacramento is between 1445.53 dollars smaller to 11307.82 dollars higher than for Cleveland.

      +

      Note: You could also use the null distribution based on randomization with a shift to have its center at \(\bar{x}_{sac} - \bar{x}_{cle} = \$4960.48\) instead of at 0 and calculate its percentiles. The confidence interval produced via this method should be comparable to the one done using bootstrapping above.

      +
      +
      +
      +
      +

      B.5.5 Traditional methods

      +
      +
      Check conditions
      +

      Remember that in order to use the short-cut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations are independent in both groups.

        +

        This metro_area variable is met since the cases are randomly selected from each city.

      2. +
      3. Approximately normal: The distribution of the response for each group should be normal or the sample sizes should be at least 30.

      4. +
      +
      ggplot(cle_sac, aes(x = income)) +
      +  geom_histogram(color = "white", binwidth = 20000) +
      +  facet_wrap(~ metro_area)
      +

      +

      We have some reason to doubt the normality assumption here since both the histograms show deviation from a normal model fitting the data well for each group. The sample sizes for each group are greater than 100 though so the assumptions should still apply.

      +
        +
      1. Independent samples: The samples should be collected without any natural pairing.

        +

        There is no mention of there being a relationship between those selected in Cleveland and in Sacramento.

      2. +
      +
      +
      +
      +

      B.5.6 Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we are interested in seeing if our observed difference in sample means (\(\bar{x}_{sac, obs} - \bar{x}_{cle, obs}\) = 4960.477) is statistically different than 0. Assuming that conditions are met and the null hypothesis is true, we can use the \(t\) distribution to standardize the difference in sample means (\(\bar{X}_{sac} - \bar{X}_{cle}\)) using the approximate standard error of \(\bar{X}_{sac} - \bar{X}_{cle}\) (invoking \(S_{sac}\) and \(S_{cle}\) as estimates of unknown \(\sigma_{sac}\) and \(\sigma_{cle}\)).

      +

      \[ T =\dfrac{ (\bar{X}_1 - \bar{X}_2) - 0}{ \sqrt{\dfrac{S_1^2}{n_1} + \dfrac{S_2^2}{n_2}} } \sim t (df = min(n_1 - 1, n_2 - 1)) \] where 1 = Sacramento and 2 = Cleveland with \(S_1^2\) and \(S_2^2\) the sample variance of the incomes of both cities, respectively, and \(n_1 = 175\) for Sacramento and \(n_2 = 212\) for Cleveland.

      +
      +

      Observed test statistic

      +

      Note that we could also do (ALMOST) this test directly using the t.test function. The x and y arguments are expected to both be numeric vectors here so we’ll need to appropriately filter our datasets.

      +
      cle_sac %>% 
      +  specify(income ~ metro_area) %>% 
      +  calculate(stat = "t",
      +            order = c("Cleveland_ OH", "Sacramento_ CA"))
      +
      # A tibble: 1 x 1
      +   stat
      +  <dbl>
      +1 -1.50
      + +

      We see here that the observed test statistic value is around -1.5.

      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies.

      + + +
      +
      +
      +

      B.5.7 Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing an \(t_{174}\) value of -1.501 or more extreme (in both directions) in our null distribution—is 0.13. This can also be calculated in R directly:

      +
      2 * pt(-1.501, df = min(212 - 1, 175 - 1), lower.tail = TRUE)
      +
      [1] 0.135
      +

      We can also approximate by using the standard normal curve:

      +
      2 * pnorm(-1.501)
      +
      [1] 0.133
      +

      Note that the 95 percent confidence interval given above matches well with the one calculated using bootstrapping.

      +
      +
      +

      B.5.8 State conclusion

      +

      We, therefore, do not have sufficient evidence to reject the null hypothesis. Our initial guess that a statistically significant difference not existing in the means was backed by this statistical analysis. We do not have evidence to suggest that the true mean income differs between Cleveland, OH and Sacramento, CA based on this data.

      +
      +
      +
      +

      B.5.9 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions also being met leads us to better guess that using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) will lead to similar results.

      +
      +
      +
      +
      +
      +

      B.6 Two means (paired samples)

      +
      +

      Problem statement

      +

      Trace metals in drinking water affect the flavor and an unusually high concentration can pose a health hazard. Ten pairs of data were taken measuring zinc concentration in bottom water and surface water at 10 randomly selected locations on a stretch of river. Do the data suggest that the true average concentration in the surface water is smaller than that of bottom water? (Note that units are not given.) [Tweaked a bit from https://onlinecourses.science.psu.edu/stat500/node/51]

      +
      +
      +

      B.6.1 Competing hypotheses

      +
      +

      In words

      +
        +
      • Null hypothesis: The mean concentration in the bottom water is the same as that of the surface water at different paired locations.

      • +
      • Alternative hypothesis: The mean concentration in the surface water is smaller than that of the bottom water at different paired locations.

      • +
      +
      +
      +

      In symbols (with annotations)

      +
        +
      • \(H_0: \mu_{diff} = 0\), where \(\mu_{diff}\) represents the mean difference in concentration for surface water minus bottom water.
      • +
      • \(H_A: \mu_{diff} < 0\)
      • +
      +
      +
      +

      Set \(\alpha\)

      +

      It’s important to set the significance level before starting the testing using the data. Let’s set the significance level at 5% here.

      +
      +
      +
      +

      B.6.2 Exploring the sample data

      +
      zinc_tidy <- read_csv("https://moderndive.com/data/zinc_tidy.csv")
      +

      We want to look at the differences in surface - bottom for each location:

      +
      zinc_diff <- zinc_tidy %>% 
      +  group_by(loc_id) %>% 
      +  summarize(pair_diff = diff(concentration)) %>% 
      +  ungroup()
      +

      Next we calculate the mean difference as our observed statistic:

      +
      d_hat <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  calculate(stat = "mean")
      +d_hat
      +
      # A tibble: 1 x 1
      +     stat
      +    <dbl>
      +1 -0.0804
      +

      The histogram below also shows the distribution of pair_diff.

      +
      ggplot(zinc_diff, aes(x = pair_diff)) +
      +  geom_histogram(binwidth = 0.04, color = "white")
      +

      +
      +

      Guess about statistical significance

      +

      We are looking to see if the sample paired mean difference of -0.08 is statistically less than 0. They seem to be quite close, but we have a small number of pairs here. Let’s guess that we will fail to reject the null hypothesis.

      +
      +
      +
      +
      +

      B.6.3 Non-traditional methods

      +
      +

      Bootstrapping for hypothesis test

      +

      In order to look to see if the observed sample mean difference \(\bar{x}_{diff} = 4960.477\) is statistically less than 0, we need to account for the number of pairs. We also need to determine a process that replicates how the paired data was selected in a way similar to how we calculated our original difference in sample means.

      +

      Treating the differences as our data of interest, we next use the process of bootstrapping to build other simulated samples and then calculate the mean of the bootstrap samples. We hypothesize that the mean difference is zero.

      +

      This process is similar to comparing the One Mean example seen above, but using the differences between the two groups as a single sample with a hypothesized mean difference of 0.

      +
      set.seed(2018)
      +null_distn_paired_means <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  hypothesize(null = "point", mu = 0) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      null_distn_paired_means %>% visualize()
      +

      +

      We can next use this distribution to observe our \(p\)-value. Recall this is a left-tailed test so we will be looking for values that are less than or equal to 4960.477 for our \(p\)-value.

      +
      null_distn_paired_means %>% 
      +  visualize(obs_stat = d_hat, direction = "less")
      +

      +
      +
      Calculate \(p\)-value
      +
      pvalue <- null_distn_paired_means %>% 
      +  get_pvalue(obs_stat = d_hat, direction = "less")
      +pvalue
      +
      # A tibble: 1 x 1
      +  p_value
      +    <dbl>
      +1       0
      +

      So our \(p\)-value is essentially 0 and we reject the null hypothesis at the 5% level. You can also see this from the histogram above that we are far into the left tail of the null distribution.

      +
      +
      +
      +

      Bootstrapping for confidence interval

      +

      We can also create a confidence interval for the unknown population parameter \(\mu_{diff}\) using our sample data (the calculated differences) with bootstrapping. This is similar to the bootstrapping done in a one sample mean case, except now our data is differences instead of raw numerical data. +Note that this code is identical to the pipeline shown in the hypothesis test above except the hypothesize() function is not called.

      +
      boot_distn_paired_means <- zinc_diff %>% 
      +  specify(response = pair_diff) %>% 
      +  generate(reps = 10000) %>% 
      +  calculate(stat = "mean")
      +
      ci <- boot_distn_paired_means %>% 
      +  get_ci()
      +ci
      +
      # A tibble: 1 x 2
      +  `2.5%` `97.5%`
      +   <dbl>   <dbl>
      +1 -0.112 -0.0503
      +
      boot_distn_paired_means %>% 
      +  visualize(endpoints = ci, direction = "between")
      +

      +

      We see that 0 is not contained in this confidence interval as a plausible value of \(\mu_{diff}\) (the unknown population parameter). This matches with our hypothesis test results of rejecting the null hypothesis. Since zero is not a plausible value of the population parameter and since the entire confidence interval falls below zero, we have evidence that surface zinc concentration levels are lower, on average, than bottom level zinc concentrations.

      +

      Interpretation: We are 95% confident the true mean zinc concentration on the surface is between 0.11 units smaller to 0.05 units smaller than on the bottom.

      +
      +
      +
      +
      +

      B.6.4 Traditional methods

      +
      +

      Check conditions

      +

      Remember that in order to use the shortcut (formula-based, theoretical) approach, we need to check that some conditions are met.

      +
        +
      1. Independent observations: The observations among pairs are independent.

        +

        The locations are selected independently through random sampling so this condition is met.

      2. +
      3. Approximately normal: The distribution of population of differences is normal or the number of pairs is at least 30.

        +

        The histogram above does show some skew so we have reason to doubt the population being normal based on this sample. We also only have 10 pairs which is fewer than the 30 needed. A theory-based test may not be valid here.

      4. +
      +
      +
      +

      Test statistic

      +

      The test statistic is a random variable based on the sample data. Here, we want to look at a way to estimate the population mean difference \(\mu_{diff}\). A good guess is the sample mean difference \(\bar{X}_{diff}\). Recall that this sample mean is actually a random variable that will vary as different samples are (theoretically, would be) collected. We are looking to see how likely is it for us to have observed a sample mean of \(\bar{x}_{diff, obs} = 0.0804\) or larger assuming that the population mean difference is 0 (assuming the null hypothesis is true). If the conditions are met and assuming \(H_0\) is true, we can “standardize” this original test statistic of \(\bar{X}_{diff}\) into a \(T\) statistic that follows a \(t\) distribution with degrees of freedom equal to \(df = n - 1\):

      +

      \[ T =\dfrac{ \bar{X}_{diff} - 0}{ S_{diff} / \sqrt{n} } \sim t (df = n - 1) \]

      +

      where \(S\) represents the standard deviation of the sample differences and \(n\) is the number of pairs.

      +
      +
      Observed test statistic
      +

      While one could compute this observed test statistic by “hand”, the focus here is on the set-up of the problem and in understanding which formula for the test statistic applies. We can use the t_test function on the differences to perform this analysis for us.

      +
      t_test_results <- zinc_diff %>% 
      +  infer::t_test(formula = pair_diff ~ NULL, 
      +         alternative = "less",
      +         mu = 0)
      +t_test_results
      +
      # A tibble: 1 x 6
      +  statistic  t_df  p_value alternative lower_ci upper_ci
      +      <dbl> <dbl>    <dbl> <chr>          <dbl>    <dbl>
      +1     -4.86     9 0.000446 less            -Inf  -0.0501
      +

      We see here that the \(t_{obs}\) value is -4.864.

      +
      +
      +
      +

      Compute \(p\)-value

      +

      The \(p\)-value—the probability of observing a \(t_{obs}\) value of -4.864 or less in our null distribution of a \(t\) with 9 degrees of freedom—is 0. This can also be calculated in R directly:

      +
      pt(-4.8638, df = nrow(zinc_diff) - 1, lower.tail = TRUE)
      +
      [1] 0.000446
      +
      +
      +

      State conclusion

      +

      We, therefore, have sufficient evidence to reject the null hypothesis. Our initial guess that our observed sample mean difference was not statistically less than the hypothesized mean of 0 has been invalidated here. Based on this sample, we have evidence that the mean concentration in the bottom water is greater than that of the surface water at different paired locations.

      +
      +
      +
      +
      +

      B.6.5 Comparing results

      +

      Observing the bootstrap distribution and the null distribution that were created, it makes quite a bit of sense that the results are so similar for traditional and non-traditional methods in terms of the \(p\)-value and the confidence interval since these distributions look very similar to normal distributions. The conditions were not met since the number of pairs was small, but the sample data was not highly skewed. Using any of the methods whether they are traditional (formula-based) or non-traditional (computational-based) lead to similar results here.

      + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/C-appendixC.html b/previous_versions/v0.4.0/C-appendixC.html new file mode 100644 index 000000000..c8c82b6fe --- /dev/null +++ b/previous_versions/v0.4.0/C-appendixC.html @@ -0,0 +1,693 @@ + + + + + + + + C Reach for the Stars | An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + +
      +

      C Reach for the Stars

      +
      +

      Needed packages

      +
      library(dplyr)
      +library(ggplot2)
      +library(knitr)
      +library(dygraphs)
      +library(nycflights13)
      +
      +
      +

      C.1 Sorted barplots

      +

      Building upon the example in Section 3.8:

      +
      flights_table <- table(flights$carrier)
      +flights_table
      +
      
      +   9E    AA    AS    B6    DL    EV    F9    FL    HA    MQ    OO    UA    US 
      +18460 32729   714 54635 48110 54173   685  3260   342 26397    32 58665 20536 
      +   VX    WN    YV 
      + 5162 12275   601 
      +

      We can sort this table from highest to lowest counts by using the sort function:

      +
      sorted_flights <- sort(flights_table, decreasing = TRUE)
      +names(sorted_flights)
      +
       [1] "UA" "B6" "EV" "DL" "AA" "MQ" "US" "9E" "WN" "VX" "FL" "AS" "F9" "YV" "HA"
      +[16] "OO"
      +

      It is often preferred for barplots to be ordered corresponding to the heights of the bars. This allows the reader to more easily compare the ordering of different airlines in terms of departed flights (Robbins 2013). We can also much more easily answer questions like “How many airlines have more departing flights than Southwest Airlines?”.

      +

      We can use the sorted table giving the number of flights defined as sorted_flights to reorder the carrier.

      +
      ggplot(data = flights, mapping = aes(x = carrier)) +
      +  geom_bar() +
      +  scale_x_discrete(limits = names(sorted_flights))
      +
      +Number of flights departing NYC in 2013 by airline - Descending numbers +

      +Figure C.1: Number of flights departing NYC in 2013 by airline - Descending numbers +

      +
      +

      The last addition here specifies the values of the horizontal x axis on a discrete scale to correspond to those given by the entries of sorted_flights.

      +
      +
      +

      C.2 Interactive graphics

      +
      +

      C.2.1 Interactive linegraphs

      +

      Another useful tool for viewing linegraphs such as this is the dygraph function in the dygraphs package in combination with the dyRangeSelector function. This allows us to zoom in on a selected range and get an interactive plot for us to work with:

      +
      library(dygraphs)
      +flights_day <- mutate(flights, date = as.Date(time_hour))
      +flights_summarized <- flights_day %>% 
      +  group_by(date) %>%
      +  summarize(median_arr_delay = median(arr_delay, na.rm = TRUE))
      +rownames(flights_summarized) <- flights_summarized$date
      +flights_summarized <- select(flights_summarized, -date)
      +dyRangeSelector(dygraph(flights_summarized))
      +
      + +


      +

      The syntax here is a little different than what we have covered so far. The dygraph function is expecting for the dates to be given as the rownames of the object. We then remove the date variable from the flights_summarized data frame since it is accounted for in the rownames. Lastly, we run the dygraph function on the new data frame that only contains the median arrival delay as a column and then provide the ability to have a selector to zoom in on the interactive plot via dyRangeSelector. (Note that this plot will only be interactive in the HTML version of this book.)

      + + +
      +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/data/ageAtMar.csv b/previous_versions/v0.4.0/data/ageAtMar.csv new file mode 100755 index 000000000..b68e12a1b --- /dev/null +++ b/previous_versions/v0.4.0/data/ageAtMar.csv @@ -0,0 +1,5535 @@ +age +32 +25 +24 +26 +32 +29 +23 +23 +29 +27 +23 +21 +29 +40 +22 +20 +31 +21 +25 +24 +23 +31 +30 +34 +29 +29 +35 +25 +22 +26 +26 +21 +25 +22 +19 +24 +22 +26 +27 +29 +25 +22 +29 +21 +23 +28 +25 +29 +27 +25 +21 +20 +31 +33 +32 +24 +33 +24 +22 +33 +33 +25 +24 +23 +20 +27 +28 +23 +27 +25 +22 +29 +27 +19 +27 +17 +29 +23 +27 +24 +27 +31 +30 +19 +24 +28 +15 +19 +26 +22 +19 +19 +22 +26 +24 +20 +30 +18 +22 +21 +17 +27 +27 +31 +34 +32 +22 +28 +25 +24 +18 +23 +17 +16 +23 +24 +21 +30 +23 +24 +26 +25 +22 +27 +19 +24 +24 +22 +26 +22 +23 +33 +28 +20 +24 +20 +26 +28 +24 +19 +41 +34 +23 +33 +39 +27 +23 +26 +18 +28 +34 +31 +18 +18 +22 +21 +29 +29 +27 +30 +25 +27 +23 +29 +22 +23 +22 +22 +26 +18 +22 +33 +19 +21 +29 +29 +18 +27 +27 +23 +20 +25 +25 +25 +23 +28 +30 +20 +28 +27 +29 +27 +29 +40 +23 +29 +32 +24 +23 +27 +29 +23 +23 +19 +17 +25 +37 +23 +42 +35 +24 +26 +27 +28 +30 +21 +18 +19 +19 +18 +22 +19 +28 +32 +20 +29 +31 +29 +31 +31 +24 +25 +24 +27 +24 +21 +29 +24 +31 +26 +19 +41 +26 +24 +23 +20 +26 +18 +27 +28 +22 +18 +26 +32 +28 +23 +32 +34 +28 +25 +25 +19 +25 +33 +25 +23 +17 +36 +29 +25 +20 +31 +28 +29 +18 +28 +20 +25 +28 +27 +25 +21 +30 +22 +25 +23 +25 +27 +27 +23 +25 +25 +21 +29 +22 +36 +22 +24 +24 +30 +25 +22 +21 +19 +27 +25 +29 +19 +25 +28 +25 +20 +30 +27 +23 +24 +30 +25 +18 +22 +30 +25 +25 +23 +29 +25 +30 +19 +22 +19 +24 +19 +25 +22 +37 +34 +35 +22 +26 +18 +31 +22 +21 +29 +22 +33 +26 +20 +24 +24 +15 +23 +28 +24 +21 +23 +26 +19 +25 +25 +17 +18 +19 +21 +29 +22 +19 +29 +23 +22 +28 +26 +23 +20 +24 +27 +25 +34 +24 +30 +22 +30 +20 +28 +27 +27 +27 +21 +23 +29 +23 +31 +20 +18 +22 +20 +28 +20 +22 +21 +21 +31 +29 +24 +31 +22 +19 +19 +18 +23 +19 +21 +29 +27 +20 +37 +27 +32 +22 +20 +21 +23 +21 +21 +23 +19 +19 +38 +21 +25 +21 +33 +23 +23 +18 +33 +27 +21 +20 +25 +18 +20 +19 +26 +20 +26 +20 +25 +16 +16 +22 +24 +22 +24 +21 +24 +34 +19 +29 +24 +23 +23 +21 +21 +23 +31 +17 +28 +19 +17 +24 +27 +23 +22 +21 +34 +17 +30 +41 +28 +27 +28 +24 +19 +20 +28 +29 +20 +25 +16 +21 +26 +31 +20 +18 +19 +21 +16 +21 +31 +21 +22 +18 +23 +33 +36 +26 +26 +18 +24 +25 +25 +28 +22 +20 +24 +20 +26 +21 +26 +26 +24 +38 +22 +25 +21 +35 +20 +18 +22 +23 +32 +27 +20 +13 +20 +26 +20 +21 +20 +37 +26 +22 +26 +29 +21 +22 +25 +27 +21 +29 +19 +32 +27 +31 +32 +20 +22 +22 +29 +17 +25 +26 +22 +26 +21 +33 +26 +21 +24 +22 +28 +26 +25 +26 +22 +24 +27 +33 +23 +26 +21 +26 +22 +26 +21 +19 +28 +30 +22 +20 +27 +26 +23 +29 +23 +26 +20 +26 +30 +18 +26 +15 +25 +25 +23 +30 +28 +18 +21 +25 +26 +26 +25 +29 +17 +22 +25 +30 +31 +24 +38 +25 +20 +26 +30 +23 +24 +25 +26 +21 +29 +29 +18 +22 +21 +29 +22 +20 +19 +20 +19 +25 +25 +18 +19 +19 +21 +24 +21 +22 +23 +19 +22 +29 +25 +26 +24 +26 +29 +22 +21 +21 +20 +25 +26 +26 +21 +24 +40 +20 +19 +16 +29 +24 +25 +22 +28 +23 +17 +16 +23 +20 +21 +23 +33 +20 +27 +19 +24 +26 +22 +26 +22 +29 +29 +25 +21 +24 +32 +42 +22 +26 +24 +19 +21 +22 +21 +27 +37 +22 +20 +31 +24 +20 +25 +22 +23 +20 +24 +25 +33 +20 +21 +19 +28 +24 +19 +19 +31 +28 +20 +27 +18 +22 +20 +22 +25 +28 +26 +20 +20 +37 +21 +19 +32 +29 +19 +19 +34 +21 +27 +29 +23 +21 +23 +26 +30 +26 +35 +23 +18 +19 +33 +30 +21 +26 +25 +35 +30 +29 +25 +20 +21 +25 +26 +25 +39 +25 +24 +26 +23 +29 +26 +22 +25 +25 +20 +22 +26 +22 +29 +26 +25 +25 +29 +19 +27 +40 +21 +25 +29 +21 +23 +26 +19 +33 +17 +28 +25 +24 +26 +18 +31 +21 +24 +26 +24 +25 +32 +18 +22 +25 +26 +23 +20 +20 +29 +30 +28 +23 +28 +27 +21 +19 +23 +21 +21 +23 +23 +18 +26 +22 +20 +20 +15 +24 +20 +32 +32 +28 +23 +26 +32 +19 +28 +24 +30 +17 +27 +26 +26 +27 +33 +36 +22 +22 +30 +26 +23 +20 +18 +21 +17 +19 +31 +21 +19 +30 +24 +25 +28 +21 +25 +26 +30 +23 +28 +27 +32 +25 +20 +20 +37 +23 +28 +24 +35 +25 +26 +21 +30 +23 +32 +21 +24 +20 +33 +20 +18 +28 +27 +19 +29 +19 +14 +20 +21 +17 +28 +21 +21 +31 +25 +27 +29 +29 +18 +29 +25 +27 +25 +21 +24 +18 +22 +22 +17 +29 +21 +33 +29 +21 +27 +18 +22 +18 +18 +20 +23 +19 +21 +21 +23 +19 +27 +22 +24 +17 +17 +27 +29 +22 +26 +33 +20 +21 +21 +27 +21 +25 +22 +17 +27 +21 +18 +30 +31 +23 +26 +19 +21 +30 +28 +19 +27 +23 +23 +19 +23 +13 +28 +23 +18 +17 +25 +25 +25 +23 +26 +21 +23 +35 +20 +23 +23 +21 +25 +16 +23 +17 +24 +20 +24 +17 +22 +28 +24 +25 +18 +23 +28 +19 +29 +27 +26 +25 +23 +28 +21 +26 +25 +29 +25 +22 +28 +23 +25 +22 +25 +19 +19 +20 +16 +24 +20 +18 +30 +19 +29 +23 +31 +15 +25 +15 +29 +23 +24 +14 +19 +31 +28 +16 +17 +28 +21 +18 +22 +20 +23 +18 +24 +25 +14 +18 +23 +19 +17 +20 +14 +22 +16 +21 +19 +17 +21 +27 +24 +19 +17 +21 +17 +25 +26 +18 +24 +26 +24 +22 +25 +24 +21 +16 +29 +18 +26 +28 +28 +19 +29 +21 +31 +30 +23 +19 +18 +23 +21 +22 +23 +21 +21 +16 +35 +21 +20 +20 +25 +30 +29 +29 +18 +18 +25 +22 +19 +30 +25 +26 +18 +26 +23 +25 +24 +18 +19 +22 +19 +28 +18 +24 +19 +22 +35 +14 +23 +19 +24 +25 +20 +28 +32 +17 +28 +29 +25 +29 +24 +25 +21 +25 +22 +24 +24 +20 +24 +21 +23 +26 +22 +20 +22 +27 +22 +21 +23 +26 +25 +22 +26 +24 +19 +34 +20 +27 +23 +20 +27 +27 +19 +25 +22 +31 +33 +19 +20 +31 +21 +17 +17 +21 +23 +18 +27 +33 +27 +23 +26 +27 +22 +18 +19 +25 +19 +24 +17 +25 +19 +15 +20 +24 +23 +19 +25 +27 +27 +22 +29 +21 +20 +32 +28 +29 +21 +26 +16 +17 +19 +21 +15 +22 +23 +22 +22 +25 +23 +21 +37 +18 +21 +29 +20 +17 +19 +22 +27 +23 +25 +22 +19 +15 +22 +19 +28 +24 +23 +20 +21 +22 +30 +24 +20 +24 +20 +18 +27 +22 +22 +40 +18 +25 +25 +23 +19 +18 +26 +24 +14 +21 +19 +18 +22 +10 +23 +18 +18 +21 +30 +18 +20 +31 +19 +17 +18 +25 +19 +22 +31 +19 +27 +17 +23 +27 +24 +17 +19 +19 +25 +21 +28 +23 +20 +23 +22 +24 +23 +37 +25 +21 +24 +18 +23 +16 +21 +17 +26 +30 +28 +23 +30 +26 +24 +20 +22 +30 +25 +15 +21 +25 +26 +26 +20 +22 +41 +20 +25 +42 +22 +25 +16 +19 +20 +23 +29 +23 +21 +23 +24 +23 +19 +22 +23 +18 +21 +18 +21 +20 +20 +19 +23 +25 +19 +25 +16 +22 +29 +19 +29 +30 +27 +22 +20 +18 +26 +20 +20 +18 +19 +31 +21 +31 +22 +20 +25 +18 +19 +19 +27 +28 +29 +25 +18 +22 +18 +22 +25 +20 +18 +20 +19 +22 +19 +18 +23 +20 +20 +24 +27 +22 +21 +31 +18 +33 +16 +23 +21 +23 +21 +20 +27 +19 +21 +28 +30 +18 +17 +28 +18 +31 +21 +32 +22 +16 +26 +26 +18 +19 +15 +22 +21 +19 +19 +19 +22 +15 +26 +29 +23 +20 +20 +20 +24 +21 +23 +32 +31 +21 +31 +28 +33 +26 +20 +24 +24 +25 +31 +24 +32 +28 +20 +22 +18 +25 +16 +18 +12 +22 +29 +31 +20 +21 +20 +23 +24 +29 +19 +20 +18 +21 +20 +22 +21 +20 +28 +22 +17 +24 +30 +17 +27 +33 +26 +22 +18 +25 +22 +32 +22 +18 +22 +27 +24 +30 +30 +25 +25 +23 +19 +28 +19 +16 +20 +25 +29 +18 +33 +25 +28 +32 +34 +32 +25 +24 +28 +26 +27 +29 +30 +20 +24 +28 +27 +24 +27 +24 +23 +26 +26 +26 +26 +26 +24 +22 +30 +21 +22 +22 +34 +25 +27 +21 +14 +30 +27 +32 +28 +20 +24 +22 +23 +15 +20 +25 +22 +22 +25 +27 +30 +26 +23 +23 +15 +21 +18 +21 +23 +22 +24 +29 +24 +19 +29 +18 +21 +20 +18 +23 +24 +19 +21 +28 +20 +30 +19 +23 +24 +24 +25 +23 +22 +29 +26 +26 +23 +25 +18 +41 +19 +28 +18 +19 +38 +19 +33 +30 +18 +16 +23 +29 +31 +19 +22 +18 +21 +17 +22 +26 +26 +20 +25 +28 +24 +23 +28 +17 +22 +25 +25 +22 +19 +18 +17 +25 +23 +32 +22 +17 +30 +27 +16 +21 +21 +26 +17 +22 +22 +31 +23 +22 +25 +30 +23 +25 +20 +25 +21 +21 +19 +16 +24 +28 +19 +19 +39 +21 +32 +25 +26 +40 +16 +25 +21 +19 +20 +20 +25 +20 +22 +25 +22 +16 +17 +19 +15 +30 +18 +21 +30 +21 +35 +20 +21 +26 +30 +17 +23 +19 +25 +19 +17 +32 +25 +32 +19 +25 +25 +27 +19 +20 +28 +25 +28 +21 +29 +23 +25 +17 +17 +24 +37 +28 +22 +25 +31 +25 +24 +19 +22 +28 +24 +24 +18 +29 +22 +27 +26 +27 +19 +35 +17 +20 +18 +19 +27 +22 +29 +26 +30 +27 +19 +22 +25 +22 +29 +28 +25 +29 +38 +23 +35 +25 +32 +23 +30 +35 +24 +26 +22 +16 +23 +26 +21 +23 +19 +28 +16 +22 +26 +26 +26 +31 +19 +21 +20 +17 +26 +20 +28 +25 +27 +20 +25 +32 +32 +19 +15 +24 +22 +20 +22 +26 +23 +21 +18 +22 +27 +19 +21 +24 +20 +25 +22 +21 +20 +18 +20 +18 +20 +27 +18 +18 +23 +16 +18 +22 +22 +19 +18 +22 +26 +19 +35 +31 +22 +15 +26 +21 +25 +38 +15 +19 +26 +18 +24 +22 +21 +18 +19 +25 +27 +20 +19 +22 +23 +23 +28 +29 +16 +23 +27 +19 +23 +23 +25 +22 +18 +25 +20 +24 +20 +27 +30 +24 +17 +18 +18 +22 +25 +23 +22 +18 +23 +21 +27 +28 +20 +29 +23 +18 +34 +24 +36 +25 +28 +35 +23 +23 +31 +23 +27 +23 +19 +20 +26 +24 +34 +29 +21 +27 +23 +27 +23 +21 +22 +21 +37 +23 +24 +29 +30 +26 +26 +21 +29 +23 +26 +21 +19 +21 +34 +25 +19 +29 +19 +22 +29 +24 +25 +28 +14 +25 +33 +21 +25 +20 +26 +20 +24 +18 +26 +27 +16 +23 +21 +22 +28 +27 +19 +18 +27 +28 +19 +27 +22 +22 +27 +26 +24 +29 +28 +22 +27 +25 +23 +19 +25 +23 +26 +22 +23 +24 +24 +18 +32 +23 +27 +23 +16 +24 +27 +28 +20 +24 +30 +18 +27 +21 +25 +28 +24 +34 +31 +23 +37 +35 +38 +35 +20 +28 +31 +27 +35 +22 +22 +33 +29 +26 +25 +19 +23 +28 +19 +37 +22 +23 +26 +21 +18 +18 +37 +29 +20 +22 +24 +25 +19 +27 +32 +27 +32 +26 +31 +37 +23 +22 +24 +32 +29 +29 +25 +23 +31 +20 +17 +16 +26 +21 +21 +28 +23 +29 +24 +17 +17 +24 +22 +21 +24 +20 +23 +22 +26 +21 +21 +24 +26 +25 +37 +24 +35 +23 +21 +24 +22 +28 +25 +25 +20 +22 +21 +21 +17 +36 +26 +28 +16 +17 +20 +14 +21 +28 +18 +25 +29 +31 +34 +28 +24 +34 +29 +18 +29 +25 +18 +19 +26 +22 +22 +20 +30 +24 +22 +29 +23 +21 +19 +26 +29 +26 +26 +18 +24 +21 +27 +22 +25 +29 +31 +23 +29 +19 +23 +29 +25 +26 +24 +23 +21 +30 +29 +22 +19 +23 +32 +34 +30 +27 +23 +22 +20 +24 +29 +21 +21 +19 +24 +24 +24 +27 +27 +19 +22 +20 +34 +22 +19 +21 +30 +18 +23 +22 +21 +25 +18 +30 +28 +17 +24 +21 +23 +23 +18 +19 +27 +23 +24 +25 +16 +19 +29 +21 +36 +28 +21 +25 +19 +26 +25 +26 +26 +20 +21 +20 +23 +26 +26 +14 +21 +29 +27 +25 +27 +22 +26 +20 +26 +19 +29 +29 +34 +25 +22 +25 +18 +22 +21 +23 +25 +32 +23 +34 +24 +23 +24 +30 +30 +27 +28 +32 +29 +41 +25 +26 +23 +30 +20 +26 +19 +22 +26 +26 +22 +21 +18 +24 +22 +20 +27 +30 +23 +24 +18 +19 +21 +35 +16 +21 +23 +24 +22 +24 +25 +25 +18 +30 +16 +21 +22 +33 +20 +17 +23 +26 +24 +23 +27 +28 +27 +25 +27 +30 +25 +29 +30 +30 +24 +21 +26 +20 +34 +21 +22 +22 +25 +17 +29 +22 +25 +21 +25 +23 +23 +21 +23 +23 +39 +25 +22 +26 +20 +19 +18 +24 +19 +20 +24 +19 +33 +25 +24 +21 +20 +25 +26 +21 +21 +29 +35 +30 +20 +22 +28 +20 +20 +37 +26 +26 +21 +23 +31 +18 +23 +21 +19 +22 +21 +16 +22 +28 +20 +21 +20 +24 +25 +20 +19 +17 +17 +20 +18 +23 +21 +25 +21 +19 +31 +30 +33 +16 +34 +34 +18 +25 +28 +18 +20 +25 +23 +26 +23 +22 +29 +24 +22 +21 +25 +25 +20 +24 +23 +25 +22 +19 +24 +27 +25 +22 +35 +29 +36 +29 +25 +21 +24 +29 +28 +30 +25 +26 +26 +30 +34 +24 +18 +25 +23 +39 +21 +20 +31 +20 +21 +28 +25 +23 +29 +19 +28 +27 +19 +18 +30 +25 +29 +18 +19 +21 +22 +21 +24 +38 +20 +23 +25 +19 +19 +18 +25 +26 +27 +19 +24 +25 +29 +18 +18 +25 +24 +18 +19 +24 +22 +25 +22 +19 +20 +20 +35 +23 +17 +25 +27 +28 +16 +32 +28 +18 +25 +18 +22 +19 +18 +27 +22 +30 +19 +23 +19 +20 +28 +24 +16 +19 +21 +24 +15 +18 +30 +20 +24 +27 +23 +26 +18 +20 +18 +21 +18 +27 +23 +29 +23 +28 +28 +27 +27 +28 +20 +22 +30 +23 +19 +19 +19 +25 +30 +24 +21 +19 +29 +20 +28 +22 +20 +19 +27 +22 +28 +25 +24 +35 +26 +26 +23 +26 +20 +24 +26 +28 +31 +26 +22 +22 +27 +26 +21 +23 +24 +27 +23 +22 +24 +21 +27 +21 +18 +17 +27 +18 +23 +28 +18 +25 +20 +20 +28 +20 +25 +26 +21 +26 +26 +23 +27 +21 +21 +33 +23 +22 +24 +21 +24 +24 +23 +25 +26 +23 +25 +20 +20 +26 +21 +26 +25 +22 +21 +28 +29 +20 +25 +21 +26 +23 +27 +24 +16 +31 +21 +23 +28 +23 +22 +27 +24 +29 +31 +30 +30 +19 +21 +21 +20 +28 +20 +29 +22 +26 +26 +28 +21 +31 +25 +23 +32 +26 +30 +28 +33 +18 +33 +22 +16 +32 +26 +23 +33 +28 +24 +25 +28 +30 +24 +21 +24 +22 +23 +23 +26 +22 +29 +28 +24 +30 +22 +22 +26 +33 +19 +24 +22 +19 +19 +25 +28 +19 +21 +30 +18 +30 +22 +20 +31 +28 +22 +27 +22 +20 +21 +22 +23 +22 +39 +21 +23 +23 +28 +20 +35 +20 +19 +35 +26 +19 +35 +25 +25 +25 +30 +22 +18 +27 +20 +25 +24 +19 +29 +18 +30 +25 +24 +22 +25 +25 +35 +25 +22 +20 +31 +26 +27 +25 +25 +21 +22 +22 +22 +25 +26 +23 +20 +28 +23 +30 +18 +25 +26 +22 +19 +19 +20 +34 +25 +22 +23 +16 +29 +35 +23 +26 +19 +38 +20 +25 +19 +24 +19 +19 +29 +21 +20 +29 +21 +23 +29 +19 +20 +22 +28 +17 +23 +24 +32 +21 +25 +15 +25 +22 +24 +27 +20 +28 +24 +36 +19 +33 +19 +31 +21 +34 +26 +23 +19 +30 +32 +25 +28 +24 +23 +31 +25 +25 +22 +27 +24 +20 +23 +23 +23 +26 +23 +26 +18 +23 +24 +24 +19 +25 +31 +23 +19 +23 +25 +22 +24 +21 +22 +22 +17 +23 +20 +24 +24 +22 +20 +26 +24 +20 +22 +27 +39 +20 +14 +22 +25 +21 +30 +19 +34 +24 +23 +18 +20 +21 +14 +27 +27 +20 +30 +20 +34 +25 +28 +24 +32 +26 +23 +20 +30 +25 +21 +19 +24 +24 +19 +20 +19 +21 +27 +21 +24 +24 +25 +27 +23 +17 +30 +39 +21 +22 +21 +24 +20 +25 +19 +23 +22 +19 +16 +20 +21 +31 +24 +24 +22 +29 +21 +22 +18 +19 +21 +25 +21 +29 +24 +22 +18 +24 +28 +19 +21 +19 +28 +25 +22 +24 +21 +27 +20 +28 +18 +18 +23 +25 +22 +22 +21 +26 +28 +26 +25 +20 +24 +20 +19 +23 +21 +30 +42 +29 +39 +34 +21 +18 +24 +25 +29 +40 +23 +24 +23 +22 +17 +17 +21 +26 +35 +22 +18 +26 +20 +22 +25 +19 +21 +27 +20 +22 +19 +21 +29 +20 +23 +17 +25 +16 +19 +21 +20 +36 +21 +17 +18 +23 +21 +21 +19 +22 +22 +21 +25 +25 +18 +27 +20 +24 +31 +19 +27 +21 +19 +19 +26 +23 +24 +24 +32 +25 +20 +27 +22 +15 +26 +29 +18 +20 +27 +22 +21 +23 +19 +25 +24 +33 +29 +33 +24 +27 +18 +28 +24 +27 +29 +33 +21 +19 +27 +19 +19 +24 +22 +12 +21 +22 +22 +30 +28 +28 +25 +23 +31 +33 +25 +27 +26 +21 +29 +21 +23 +17 +23 +24 +16 +22 +20 +21 +19 +22 +26 +30 +27 +35 +26 +22 +33 +25 +19 +21 +20 +34 +25 +26 +26 +19 +26 +24 +23 +27 +18 +22 +25 +19 +27 +25 +26 +26 +29 +23 +22 +23 +28 +21 +17 +19 +20 +17 +22 +26 +18 +19 +26 +18 +20 +22 +24 +23 +19 +19 +24 +25 +25 +26 +21 +17 +20 +17 +20 +22 +19 +19 +18 +27 +26 +21 +33 +25 +31 +16 +28 +22 +26 +18 +19 +23 +18 +24 +29 +30 +25 +16 +26 +20 +23 +23 +20 +16 +26 +33 +19 +22 +17 +20 +21 +18 +26 +19 +24 +24 +26 +23 +21 +20 +18 +19 +27 +21 +28 +24 +31 +20 +22 +21 +18 +29 +26 +18 +20 +22 +22 +22 +20 +22 +28 +33 +19 +26 +20 +25 +19 +34 +18 +19 +18 +19 +29 +16 +27 +19 +19 +26 +24 +28 +29 +22 +19 +20 +28 +22 +16 +20 +20 +28 +26 +29 +25 +23 +18 +19 +18 +15 +29 +20 +33 +27 +19 +29 +28 +26 +20 +21 +22 +32 +18 +24 +19 +20 +21 +20 +28 +21 +31 +24 +40 +21 +15 +26 +26 +26 +19 +21 +22 +18 +23 +21 +20 +21 +20 +20 +35 +26 +30 +19 +27 +19 +20 +29 +24 +17 +22 +16 +34 +25 +24 +20 +24 +27 +21 +30 +26 +38 +38 +18 +18 +32 +19 +24 +20 +15 +24 +30 +25 +15 +28 +20 +27 +24 +31 +15 +19 +31 +20 +35 +18 +20 +19 +25 +16 +24 +22 +28 +30 +21 +24 +20 +20 +19 +34 +23 +16 +21 +16 +19 +20 +21 +27 +21 +22 +20 +22 +19 +23 +30 +24 +31 +31 +22 +19 +33 +23 +22 +19 +16 +18 +22 +17 +20 +18 +31 +25 +23 +19 +18 +19 +21 +21 +27 +22 +19 +29 +20 +20 +16 +18 +24 +25 +19 +31 +19 +22 +19 +16 +17 +23 +21 +22 +27 +16 +23 +20 +20 +20 +19 +19 +24 +18 +21 +23 +20 +21 +20 +20 +18 +20 +20 +22 +18 +33 +18 +15 +17 +21 +16 +36 +31 +17 +21 +17 +18 +17 +19 +23 +18 +17 +20 +16 +23 +20 +24 +18 +14 +17 +14 +17 +23 +18 +24 +23 +25 +20 +18 +15 +22 +26 +17 +19 +18 +34 +24 +18 +27 +20 +20 +19 +19 +20 +19 +25 +27 +23 +28 +22 +27 +24 +26 +15 +26 +28 +24 +33 +24 +23 +16 +30 +21 +22 +26 +23 +18 +28 +26 +31 +22 +27 +21 +20 +19 +29 +16 +24 +26 +21 +25 +19 +26 +29 +28 +24 +29 +28 +21 +17 +22 +26 +19 +34 +26 +19 +29 +24 +30 +16 +24 +25 +22 +24 +19 +22 +21 +23 +30 +20 +22 +27 +27 +28 +23 +24 +17 +31 +25 +25 +25 +22 +23 +17 +25 +29 +33 +19 +24 +33 +18 +27 +30 +15 +30 +17 +21 +25 +18 +28 +22 +23 +20 +18 +19 +32 +24 +25 +23 +26 +30 +24 +25 +25 +20 +24 +19 +22 +31 +26 +28 +28 +24 +19 +26 +18 +25 +17 +34 +19 +28 +20 +21 +21 +18 +18 +19 +21 +34 +20 +24 +16 +20 +22 +22 +21 +24 +23 +20 +19 +17 +19 +21 +33 +25 +18 +17 +29 +27 +27 +33 +22 +22 +23 +13 +25 +24 +21 +21 +32 +20 +21 +28 +20 +29 +25 +25 +28 +34 +26 +25 +24 +21 +25 +20 +21 +27 +27 +18 +23 +14 +27 +22 +24 +21 +26 +24 +23 +19 +20 +22 +22 +20 +30 +23 +28 +19 +21 +23 +26 +19 +27 +27 +22 +24 +25 +36 +19 +34 +35 +26 +21 +23 +33 +20 +23 +26 +21 +19 +24 +20 +28 +21 +37 +26 +21 +18 +20 +18 +43 +25 +19 +28 +19 +20 +25 +20 +21 +15 +21 +20 +21 +19 +29 +22 +22 +18 +20 +29 +29 +23 +27 +21 +20 +18 +35 +25 +23 +24 +18 +20 +19 +18 +16 +37 +26 +24 +33 +35 +23 +20 +22 +14 +24 +19 +19 +18 +29 +15 +17 +37 +22 +25 +19 +20 +32 +21 +19 +29 +21 +23 +16 +24 +20 +22 +18 +18 +19 +23 +39 +21 +19 +22 +24 +25 +28 +18 +16 +18 +21 +21 +18 +18 +20 +24 +23 +15 +19 +19 +22 +23 +27 +26 +25 +24 +22 +18 +17 +18 +26 +18 +24 +18 +23 +20 +24 +24 +21 +27 +27 +35 +24 +25 +23 +20 +24 +20 +25 +21 +24 +23 +25 +21 +20 +21 +20 +32 +24 +18 +28 +16 +19 +18 +23 +24 +25 +20 +23 +20 +29 +23 +18 +21 +21 +23 +23 +21 +22 +22 +21 +20 +28 +21 +22 +21 +21 +24 +20 +28 +17 +21 +18 +20 +19 +20 +23 +33 +19 +18 +25 +23 +24 +19 +23 +25 +21 +26 +27 +19 +28 +20 +34 +25 +20 +19 +22 +22 +30 +21 +24 +18 +20 +15 +19 +23 +24 +36 +18 +27 +21 +17 +21 +26 +18 +24 +31 +14 +30 +26 +23 +19 +16 +23 +19 +20 +28 +23 +23 +33 +34 +32 +21 +20 +18 +25 +26 +24 +27 +17 +31 +38 +22 +31 +20 +25 +23 +15 +24 +21 +20 +19 +15 +23 +24 +28 +20 +28 +27 +19 +24 +25 +19 +25 +29 +25 +22 +21 +26 +21 +25 +21 +18 +27 +25 +23 +22 +23 +23 +24 +22 +21 +22 +20 +23 +25 +23 +21 +20 +21 +21 +22 +26 +25 +18 +18 +25 +24 +20 +26 +21 +20 +23 +20 +20 +17 +17 +19 +23 +20 +19 +19 +21 +20 +26 +22 +22 +24 +28 +22 +25 +22 +19 +20 +21 +21 +21 +22 +28 +20 +21 +25 +22 +24 +30 +21 +19 +21 +24 +27 +21 +19 +22 +15 +18 +20 +21 +19 +22 +16 +21 +18 +23 +19 +19 +21 +24 +27 +21 +27 +24 +31 +20 +26 +20 +21 +18 +24 +21 +24 +19 +18 +24 +23 +33 +25 +22 +30 +28 +21 +29 +25 +25 +29 +27 +25 +27 +27 +25 +27 +27 +20 +22 +27 +36 +19 +16 +24 +18 +27 +26 +19 +23 +22 +22 +24 +24 +22 +26 +29 +23 +25 +25 +21 +24 +21 +24 +22 +17 +24 +26 +25 +19 +21 +21 +20 +20 +22 +24 +30 +26 +24 +29 +22 +28 +27 +32 +23 +19 +24 +28 +30 +25 +33 +30 +21 +18 +29 +32 +28 +34 +21 +22 +30 +21 +24 +25 +33 +18 +23 +24 +34 +26 +25 +22 +23 +26 +33 +27 +24 +25 +22 +29 +19 +26 +22 +23 +19 +18 +15 +20 +24 +18 +18 +21 +18 +18 +18 +19 +17 +31 +20 +16 +24 +20 +25 +25 +22 +18 +18 +26 +23 +40 +20 +19 +21 +19 +21 +23 +19 +25 +20 +22 +24 +20 +23 +29 +20 +23 +23 +19 +23 +25 +23 +24 +25 +22 +28 +23 +28 +23 +16 +24 +23 +20 +27 +25 +20 +25 +30 +31 +23 +19 +29 +18 +25 +22 +22 +20 +13 +38 +18 +22 +19 +20 +18 +28 +16 +25 +19 +24 +21 +21 +19 +18 +21 +21 +18 +21 +24 +17 +21 +20 +19 +19 +18 +24 +18 +25 +28 +18 +27 +19 +27 +19 +31 +19 +28 +21 +17 +29 +21 +18 +26 +24 +31 +25 +23 +27 +22 +26 +27 +23 +20 +20 +27 +29 +21 +23 +35 +27 +19 +31 +34 +19 +23 +26 +27 +17 +19 +18 +19 +19 +20 +23 +24 +20 +21 +17 +18 +23 +21 +21 +24 +16 +19 +19 +16 +21 +17 +24 +19 +16 +21 +16 +22 +25 +42 +25 +22 +16 +25 +17 +23 +30 +31 +23 +26 +24 +18 +23 +28 +21 +21 +18 +19 +27 +21 +18 +24 +14 +21 +26 +28 +18 +19 +18 +36 +22 +21 +17 +18 +30 +21 +22 +23 +20 +21 +22 +26 +25 +22 +29 +21 +23 +18 +18 +25 +23 +19 +18 +29 +27 +22 +26 +26 +17 +26 +22 +30 +26 +16 +28 +26 +20 +19 +18 +23 +22 +35 +26 +21 +22 +23 +24 +23 +20 +22 +25 +21 +24 +33 +18 +22 +25 +33 +19 +20 +24 +24 +24 +28 +20 +32 +21 +23 +26 +25 +24 +23 +24 +30 +22 +28 +30 +19 +30 +23 +28 +20 +24 +28 +19 +22 +18 +24 +25 +22 +30 +24 +24 +19 +30 +27 +23 +32 +23 +29 +25 +17 +19 +18 +19 +18 +24 +22 +28 +24 +21 +27 +22 +23 +28 +24 +18 +23 +20 +22 +22 +17 +23 +23 +28 +22 +20 +24 +24 +24 +22 +26 +26 +33 +20 +21 +30 +26 +26 +21 +19 +20 +24 +34 +21 +18 +19 +23 +26 +29 +19 +25 +21 +22 +26 +28 +27 +27 +19 +22 +24 +20 +25 +18 +21 +21 +20 +19 +20 +26 +24 +20 +18 +27 +19 +21 +24 +23 +21 +27 +20 +26 +21 +18 +20 +23 +23 +24 +29 +20 +21 +18 +25 +22 +29 +18 +19 +30 +18 +25 +20 +22 +24 +27 +25 +25 +22 +18 +17 +19 +27 +28 +26 +20 +22 +24 +23 +23 +25 +20 +23 +27 +20 +24 +23 +25 +24 +19 +18 +22 +24 +23 +15 +19 +18 +22 +16 +18 +35 +22 +22 +20 +25 +20 +20 +25 +22 +37 +21 +18 +19 +18 +18 +27 +21 +24 +20 +20 +19 +22 +22 +23 +20 +18 +19 +22 +25 +25 +25 +20 +18 +20 +24 +21 +18 +19 +19 +21 +19 +20 +27 +27 +23 +24 +22 +19 +20 +22 +18 +19 +29 +16 +38 +24 +19 +23 +14 +36 +25 +19 +23 +30 +26 +28 +26 +26 +15 +22 +21 +20 +22 +21 +22 +19 +28 +18 +33 +25 +16 +24 +19 +24 +20 +24 +21 +25 +21 +20 +28 +19 +21 +24 +18 +18 +31 +18 +20 +19 +23 +19 +23 +25 +20 +24 +20 +21 +26 +22 +22 +25 +24 +21 +23 +25 +24 +18 +23 +25 +18 +26 +24 +21 +25 +23 +22 +28 +21 +24 +20 +26 +25 +19 +20 +24 +16 +25 +26 +31 +26 +20 +29 +23 +19 +24 +27 +22 +27 +23 +22 +24 +20 +19 +26 +23 +21 +19 +20 +31 +17 +18 +21 +17 +22 +22 +26 +26 +22 +18 +15 +19 +26 +23 +20 +15 +23 +18 +22 +21 +21 +21 +27 +19 +20 +28 +21 +39 +26 +22 +20 +24 +20 +20 +28 +30 +18 +22 +28 +20 +19 +19 +20 +27 +18 +24 +21 +20 +20 +32 +20 +22 +18 +22 +18 +30 +17 +17 +20 +23 +17 +24 +24 +16 +20 +20 +24 +26 +22 +19 +21 +28 +21 +26 +26 +17 +27 +26 +19 +33 +22 +18 +21 +21 +24 +16 +20 +22 +14 +22 +21 +21 +19 +24 +39 +20 +16 +25 +20 +26 +29 +23 +29 +26 +20 +20 +36 +30 +24 +23 +30 +27 +29 +26 +25 +23 +24 +28 +27 +18 +32 +18 +23 +19 +21 +21 +17 +27 +19 +26 +24 +21 +21 +27 +23 +23 +23 +23 +25 +21 +27 +20 +23 +21 +27 +20 +23 +23 +18 +16 +19 +19 +37 +19 +23 +22 +27 +26 +19 +22 +24 +19 +16 +17 +20 +22 +23 +18 +24 +19 +17 +29 +25 +21 +23 +23 +20 +19 +17 +21 +15 +24 +25 +18 +20 +23 +20 +22 +19 +27 +15 +24 +19 +16 +19 +16 +15 +14 +18 +16 +19 +17 +19 +18 +16 +18 +21 +18 +42 +20 +17 +17 +19 +18 +28 +16 +31 +29 +26 +28 +18 +17 +17 +17 +30 +23 +25 +19 +20 +19 +20 +20 +25 +26 +20 +24 +18 +27 +25 +20 +20 +22 +19 +25 +30 +22 +17 +19 +19 +21 +36 +17 +25 +17 +13 +20 +28 +21 +21 +26 +40 +24 +25 +33 +23 +35 +23 +19 +22 +18 +23 +27 +31 +19 +23 +27 +22 +18 +19 +18 +22 +21 +22 +37 +19 +22 +25 +27 +38 +33 +19 +23 +17 +41 +20 +20 +21 +34 +20 +20 +20 +15 +20 +30 +23 +16 +28 +18 +21 +16 +18 +18 +18 +26 +18 +18 +21 +20 +21 +18 +20 +17 +21 +21 +18 +22 +15 +22 +18 +22 +20 +24 +20 +17 +29 +25 +18 +23 +21 +18 +18 +21 +18 +23 +25 +20 +20 +20 +17 +20 +25 +18 +25 +24 +18 +20 +19 +27 +28 +21 +22 +28 +16 +17 +16 +19 +17 +29 +21 +22 +21 +18 +22 +27 +26 +22 +20 +20 +24 +19 +22 +18 +32 +21 +19 +21 +15 +28 +20 +25 +19 +24 +19 +19 +33 +39 +18 +21 +25 +19 +19 +23 +21 +29 +19 +24 +22 +25 +21 +18 +24 +18 +21 +20 +18 +23 +33 +21 +19 +18 +26 +21 +17 +18 +34 +18 +21 +18 +19 +17 +32 +24 +21 +24 +20 +18 +22 +20 +17 +23 +21 +19 +26 +23 +26 +21 +23 +15 +21 +17 +28 +20 +28 +20 +22 +22 +24 +17 +32 +24 +16 +24 +23 +20 +27 +22 +42 +28 +18 +31 +22 +22 +19 +19 +22 +32 +15 +27 +23 +23 +18 +18 +22 +25 +20 +22 +22 +17 +21 +17 +20 +15 +19 +18 +26 +25 +18 +24 +26 +22 +18 +22 +17 +31 +18 +21 +31 +20 +26 +27 +25 +26 +27 +19 +18 +24 +18 +22 +23 +28 +28 +23 +26 +29 +28 +18 +20 +20 +15 +18 +23 +26 +20 +20 +23 +26 +19 +19 +20 +25 +21 +21 +24 +19 +20 +16 +14 +24 +19 +28 +20 +25 +31 +21 +22 +23 +19 +24 +19 +20 +19 +20 +22 +22 +27 +22 +26 +22 +14 +19 +18 +20 +27 +20 +20 +21 +21 +24 +24 +16 +25 +27 +22 +21 +31 +26 +20 +17 +21 +20 +19 +19 +21 +16 +21 +33 +22 +19 +25 +23 +23 +21 +22 +27 +20 +21 +23 +17 +23 +18 +28 +25 +23 +31 +35 +23 +20 +18 +24 +31 +19 +32 +19 +30 +19 +26 +19 +22 +16 +19 +21 +21 +40 +23 +26 +17 +20 +17 +31 +21 +22 +22 +18 +17 +22 +24 +25 +25 +23 +24 +23 +30 +21 +25 +32 +23 +27 +26 +22 +25 +34 +16 +22 +22 +18 +23 +23 +20 +28 +26 +26 +19 +34 +22 +28 +19 +24 +21 +19 diff --git a/previous_versions/v0.4.0/data/cleSac.txt b/previous_versions/v0.4.0/data/cleSac.txt new file mode 100755 index 000000000..20e7da082 --- /dev/null +++ b/previous_versions/v0.4.0/data/cleSac.txt @@ -0,0 +1 @@ +Census_year State_FIPS_code Metropolitan_area_Detailed Age Sex Race_General Marital_status Total_personal_income 2000 California Sacramento_ CA 56 Male Japanese Married_ spouse present 40240 2000 California Sacramento_ CA 53 Female White Married_ spouse present 13600 2000 California Sacramento_ CA 17 Female Two major races Never married/single (N/A) 0 2000 California Sacramento_ CA 37 Female White Never married/single (N/A) 49000 2000 California Sacramento_ CA 40 Male White Never married/single (N/A) 38300 2000 California Sacramento_ CA 23 Male Other race_ nec Never married/single (N/A) 14000 2000 California Sacramento_ CA 40 Female Black/Negro Divorced 9000 2000 California Sacramento_ CA 11 Male Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 46 Male Black/Negro Married_ spouse present 40000 2000 California Sacramento_ CA 34 Female Black/Negro Married_ spouse present 18000 2000 California Sacramento_ CA 16 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 11 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 7 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 23 Male White Never married/single (N/A) 65000 2000 California Sacramento_ CA 30 Female White Divorced 30000 2000 California Sacramento_ CA 35 Male White Married_ spouse present 61100 2000 California Sacramento_ CA 30 Male White Married_ spouse present 62000 2000 California Sacramento_ CA 28 Female White Married_ spouse present 5500 2000 California Sacramento_ CA 3 Female White Never married/single (N/A) 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 42 Male White Married_ spouse present 36000 2000 California Sacramento_ CA 17 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 6 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 40 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 37 Female White Married_ spouse present 70000 2000 California Sacramento_ CA 9 Male White Never married/single (N/A) 2000 California Sacramento_ CA 7 Male White Never married/single (N/A) 2000 California Sacramento_ CA 39 Male White Divorced 34400 2000 California Sacramento_ CA 33 Male Other Asian or Pacific Islander Married_ spouse present 18000 2000 California Sacramento_ CA 37 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 62 Male Other Asian or Pacific Islander Married_ spouse present 3800 2000 California Sacramento_ CA 27 Male Other race_ nec Married_ spouse absent 15000 2000 California Sacramento_ CA 11 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 21 Female Other race_ nec Married_ spouse absent 0 2000 California Sacramento_ CA 5 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 4 Female Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 80 Female White Widowed 55100 2000 California Sacramento_ CA 28 Female Other Asian or Pacific Islander Married_ spouse present 27000 2000 California Sacramento_ CA 0 Male Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 85 Female White Widowed 13900 2000 California Sacramento_ CA 24 Female White Never married/single (N/A) 20000 2000 California Sacramento_ CA 45 Female Black/Negro Divorced 150000 2000 California Sacramento_ CA 52 Female White Divorced 8300 2000 California Sacramento_ CA 23 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 16 Male Black/Negro Never married/single (N/A) 0 2000 California Sacramento_ CA 43 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 62 Male White Married_ spouse present 42000 2000 California Sacramento_ CA 60 Female White Divorced 1400 2000 California Sacramento_ CA 52 Male White Married_ spouse present 70000 2000 California Sacramento_ CA 51 Female White Married_ spouse present 35000 2000 California Sacramento_ CA 49 Female White Divorced 66000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 13500 2000 California Sacramento_ CA 4 Female White Never married/single (N/A) 2000 California Sacramento_ CA 2 Female White Never married/single (N/A) 2000 California Sacramento_ CA 49 Female Other Asian or Pacific Islander Married_ spouse present 5100 2000 California Sacramento_ CA 51 Male Other Asian or Pacific Islander Married_ spouse present 8100 2000 California Sacramento_ CA 19 Female Other Asian or Pacific Islander Never married/single (N/A) 8000 2000 California Sacramento_ CA 25 Male Other Asian or Pacific Islander Married_ spouse present 32000 2000 California Sacramento_ CA 55 Female White Married_ spouse present 51800 2000 California Sacramento_ CA 39 Female White Never married/single (N/A) 25000 2000 California Sacramento_ CA 39 Male White Married_ spouse absent 95000 2000 California Sacramento_ CA 25 Female American Indian or Alaska Native Never married/single (N/A) 32000 2000 California Sacramento_ CA 24 Female White Married_ spouse present 0 2000 California Sacramento_ CA 4 Male White Never married/single (N/A) 2000 California Sacramento_ CA 77 Male White Married_ spouse present 55000 2000 California Sacramento_ CA 63 Female Two major races Married_ spouse present 51000 2000 California Sacramento_ CA 33 Male White Married_ spouse present 20000 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 4 Male White Never married/single (N/A) 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 35 Male Black/Negro Divorced 850 2000 California Sacramento_ CA 44 Male White Married_ spouse present 80000 2000 California Sacramento_ CA 44 Female White Married_ spouse present 44000 2000 California Sacramento_ CA 18 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 19 Male Two major races Never married/single (N/A) 3750 2000 California Sacramento_ CA 37 Male Black/Negro Married_ spouse present 20000 2000 California Sacramento_ CA 1 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 30 Male White Married_ spouse present 36000 2000 California Sacramento_ CA 39 Male White Married_ spouse absent 55000 2000 California Sacramento_ CA 41 Female White Married_ spouse absent 0 2000 California Sacramento_ CA 36 Female White Never married/single (N/A) 32000 2000 California Sacramento_ CA 33 Female White Divorced 36000 2000 California Sacramento_ CA 18 Male Other race_ nec Never married/single (N/A) 2010 2000 California Sacramento_ CA 2 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 49 Male White Married_ spouse present 76300 2000 California Sacramento_ CA 46 Female White Married_ spouse present 41000 2000 California Sacramento_ CA 20 Female Black/Negro Never married/single (N/A) 10000 2000 California Sacramento_ CA 35 Male White Divorced 9600 2000 California Sacramento_ CA 59 Male White Divorced 54000 2000 California Sacramento_ CA 44 Female White Never married/single (N/A) 29000 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 51 Male Japanese Married_ spouse present 12000 2000 California Sacramento_ CA 19 Male White Never married/single (N/A) 2000 2000 California Sacramento_ CA 16 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 54 Female White Married_ spouse present 39400 2000 California Sacramento_ CA 51 Female White Married_ spouse present 0 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 30 Female White Married_ spouse present 40000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 30000 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 63 Female White Married_ spouse present 22100 2000 California Sacramento_ CA 46 Female White Divorced 17900 2000 California Sacramento_ CA 26 Male White Never married/single (N/A) 20000 2000 California Sacramento_ CA 46 Female Black/Negro Divorced 23000 2000 California Sacramento_ CA 24 Male Black/Negro Never married/single (N/A) 25000 2000 California Sacramento_ CA 80 Male White Married_ spouse absent 12000 2000 California Sacramento_ CA 36 Male White Married_ spouse present 10900 2000 California Sacramento_ CA 29 Male White Married_ spouse absent 160000 2000 California Sacramento_ CA 64 Male White Divorced 14000 2000 California Sacramento_ CA 27 Female White Married_ spouse present 19600 2000 California Sacramento_ CA 29 Male White Married_ spouse present 68000 2000 California Sacramento_ CA 93 Male White Married_ spouse present 39300 2000 California Sacramento_ CA 22 Male Other Asian or Pacific Islander Never married/single (N/A) 12000 2000 California Sacramento_ CA 23 Male Other Asian or Pacific Islander Never married/single (N/A) 6700 2000 California Sacramento_ CA 38 Male White Divorced 50000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 52490 2000 California Sacramento_ CA 39 Male White Married_ spouse present 62400 2000 California Sacramento_ CA 11 Male White Never married/single (N/A) 2000 California Sacramento_ CA 25 Female Other Asian or Pacific Islander Married_ spouse present 0 2000 California Sacramento_ CA 8 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 44 Male Two major races Married_ spouse present 16000 2000 California Sacramento_ CA 39 Female Two major races Married_ spouse present 6900 2000 California Sacramento_ CA 21 Male Other race_ nec Never married/single (N/A) 4000 2000 California Sacramento_ CA 20 Male White Never married/single (N/A) 13000 2000 California Sacramento_ CA 17 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 12 Female Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 21 Male Other race_ nec Never married/single (N/A) 16700 2000 California Sacramento_ CA 37 Female Black/Negro Separated 24900 2000 California Sacramento_ CA 33 Male Black/Negro Never married/single (N/A) 16100 2000 California Sacramento_ CA 15 Male Black/Negro Never married/single (N/A) 7100 2000 California Sacramento_ CA 7 Female Black/Negro Never married/single (N/A) 2000 California Sacramento_ CA 12 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 4 Female Other Asian or Pacific Islander Never married/single (N/A) 2000 California Sacramento_ CA 1 Male White Never married/single (N/A) 2000 California Sacramento_ CA 16 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 15 Male White Never married/single (N/A) 0 2000 California Sacramento_ CA 65 Male White Married_ spouse present 44800 2000 California Sacramento_ CA 71 Female White Married_ spouse present 16000 2000 California Sacramento_ CA 71 Male White Married_ spouse present 86700 2000 California Sacramento_ CA 65 Female White Married_ spouse present 3000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 12500 2000 California Sacramento_ CA 12 Female White Never married/single (N/A) 2000 California Sacramento_ CA 52 Female White Married_ spouse present 16600 2000 California Sacramento_ CA 40 Female Two major races Married_ spouse present 170000 2000 California Sacramento_ CA 46 Male Two major races Married_ spouse present 70000 2000 California Sacramento_ CA 31 Male White Married_ spouse present 32000 2000 California Sacramento_ CA 8 Male White Never married/single (N/A) 2000 California Sacramento_ CA 36 Male Other race_ nec Married_ spouse present 40040 2000 California Sacramento_ CA 8 Male Other race_ nec Never married/single (N/A) 2000 California Sacramento_ CA 45 Male White Married_ spouse present 10000 2000 California Sacramento_ CA 36 Female White Married_ spouse present 34000 2000 California Sacramento_ CA 29 Male White Married_ spouse present 600 2000 California Sacramento_ CA 24 Female White Married_ spouse present 0 2000 California Sacramento_ CA 43 Male White Married_ spouse present 6000 2000 California Sacramento_ CA 2 Male White Never married/single (N/A) 2000 California Sacramento_ CA 0 Female White Never married/single (N/A) 2000 California Sacramento_ CA 37 Male Black/Negro Married_ spouse present 50020 2000 California Sacramento_ CA 35 Female Black/Negro Married_ spouse present 50020 2000 California Sacramento_ CA 50 Male White Married_ spouse present 92000 2000 California Sacramento_ CA 49 Female White Married_ spouse present 70000 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 35 Male White Married_ spouse present 15000 2000 California Sacramento_ CA 27 Female White Married_ spouse present 16000 2000 California Sacramento_ CA 38 Female Other Asian or Pacific Islander Never married/single (N/A) 30000 2000 California Sacramento_ CA 58 Male White Married_ spouse present 134000 2000 California Sacramento_ CA 53 Female Two major races Married_ spouse present 0 2000 California Sacramento_ CA 41 Male White Married_ spouse present 170000 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 48 Female Two major races Married_ spouse present 0 2000 California Sacramento_ CA 33 Female Two major races Married_ spouse present 65200 2000 California Sacramento_ CA 82 Female White Widowed 49700 2000 California Sacramento_ CA 50 Male White Married_ spouse present 79100 2000 California Sacramento_ CA 47 Female White Married_ spouse present 14200 2000 California Sacramento_ CA 30 Male Two major races Married_ spouse present 20000 2000 California Sacramento_ CA 44 Female Two major races Married_ spouse present 105200 2000 California Sacramento_ CA 41 Female White Married_ spouse present 20000 2000 California Sacramento_ CA 4 Female White Never married/single (N/A) 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 25 Female White Never married/single (N/A) 29200 2000 California Sacramento_ CA 7 Female White Never married/single (N/A) 2000 California Sacramento_ CA 53 Female White Married_ spouse present 60000 2000 California Sacramento_ CA 19 Female White Never married/single (N/A) 2000 2000 California Sacramento_ CA 93 Male White Divorced 22600 2000 California Sacramento_ CA 32 Male White Divorced 12000 2000 California Sacramento_ CA 50 Female White Married_ spouse present 34000 2000 California Sacramento_ CA 53 Male White Married_ spouse present 24600 2000 California Sacramento_ CA 41 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 38 Female White Married_ spouse present 21000 2000 California Sacramento_ CA 15 Female White Never married/single (N/A) 0 2000 California Sacramento_ CA 8 Male White Never married/single (N/A) 2000 California Sacramento_ CA 0 Male White Never married/single (N/A) 2000 California Sacramento_ CA 79 Female White Widowed 16000 2000 California Sacramento_ CA 63 Female White Widowed 206900 2000 California Sacramento_ CA 41 Male White Married_ spouse present 10000 2000 California Sacramento_ CA 40 Female White Married_ spouse present 5600 2000 California Sacramento_ CA 34 Female White Never married/single (N/A) 24500 2000 California Sacramento_ CA 11 Male White Never married/single (N/A) 2000 California Sacramento_ CA 51 Male White Married_ spouse present 84900 2000 California Sacramento_ CA 11 Female White Never married/single (N/A) 2000 California Sacramento_ CA 66 Male White Married_ spouse present 9300 2000 California Sacramento_ CA 65 Female White Married_ spouse present 5600 2000 California Sacramento_ CA 52 Male White Married_ spouse present 60400 2000 California Sacramento_ CA 31 Male White Never married/single (N/A) 25000 2000 California Sacramento_ CA 54 Female White Divorced 25000 2000 California Sacramento_ CA 7 Male White Never married/single (N/A) 2000 California Sacramento_ CA 5 Female White Never married/single (N/A) 2000 California Sacramento_ CA 43 Female White Married_ spouse present 5000 2000 California Sacramento_ CA 12 Male White Never married/single (N/A) 2000 California Sacramento_ CA 44 Male White Never married/single (N/A) 20000 2000 California Sacramento_ CA 69 Female White Widowed 30400 2000 California Sacramento_ CA 52 Female White Separated 13000 2000 California Sacramento_ CA 42 Male White Married_ spouse present 81000 2000 California Sacramento_ CA 47 Female White Married_ spouse present 13400 2000 California Sacramento_ CA 12 Female White Never married/single (N/A) 2000 California Sacramento_ CA 59 Female White Married_ spouse present 30400 2000 California Sacramento_ CA 14 Female White Never married/single (N/A) 2000 California Sacramento_ CA 6 Male White Never married/single (N/A) 2000 California Sacramento_ CA 1 Male White Never married/single (N/A) 2000 California Sacramento_ CA 28 Female White Married_ spouse present 37600 2000 California Sacramento_ CA 1 Female White Never married/single (N/A) 2000 California Sacramento_ CA 0 Female White Never married/single (N/A) 2000 California Sacramento_ CA 62 Male White Married_ spouse present 115000 2000 California Sacramento_ CA 83 Female Chinese Widowed 82100 2000 California Sacramento_ CA 9 Female White Never married/single (N/A) 2000 California Sacramento_ CA 50 Male White Married_ spouse present 50000 2000 California Sacramento_ CA 48 Male White Married_ spouse present 13600 2000 California Sacramento_ CA 23 Male White Never married/single (N/A) 900 2000 Ohio Cleveland_ OH 76 Male White Married_ spouse absent 33000 2000 Ohio Cleveland_ OH 68 Male White Married_ spouse present 41300 2000 Ohio Cleveland_ OH 46 Female White Married_ spouse present 47700 2000 Ohio Cleveland_ OH 45 Male White Widowed 6690 2000 Ohio Cleveland_ OH 48 Male White Married_ spouse present 90000 2000 Ohio Cleveland_ OH 48 Female White Married_ spouse present 21000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 81000 2000 Ohio Cleveland_ OH 50 Female White Married_ spouse present 17000 2000 Ohio Cleveland_ OH 62 Female White Married_ spouse present 2300 2000 Ohio Cleveland_ OH 30 Male White Married_ spouse present 35200 2000 Ohio Cleveland_ OH 31 Female White Married_ spouse present 24600 2000 Ohio Cleveland_ OH 5 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 49 Male White Married_ spouse present 127000 2000 Ohio Cleveland_ OH 16 Male White Never married/single (N/A) 130 2000 Ohio Cleveland_ OH 88 Female White Widowed 19900 2000 Ohio Cleveland_ OH 35 Female White Married_ spouse present 10000 2000 Ohio Cleveland_ OH 38 Male White Never married/single (N/A) 18200 2000 Ohio Cleveland_ OH 67 Female White Married_ spouse present 8400 2000 Ohio Cleveland_ OH 48 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 32 Male White Married_ spouse present 43000 2000 Ohio Cleveland_ OH 21 Female Black/Negro Never married/single (N/A) 10600 2000 Ohio Cleveland_ OH 32 Female White Married_ spouse present 27600 2000 Ohio Cleveland_ OH 12 Female Two major races Never married/single (N/A) 2000 Ohio Cleveland_ OH 7 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 38 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 11 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 8 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 42 Male White Married_ spouse present 96030 2000 Ohio Cleveland_ OH 42 Female White Married_ spouse present 28000 2000 Ohio Cleveland_ OH 25 Male White Divorced 25000 2000 Ohio Cleveland_ OH 54 Male White Married_ spouse present 31000 2000 Ohio Cleveland_ OH 52 Female White Married_ spouse present 36100 2000 Ohio Cleveland_ OH 29 Male White Separated 61000 2000 Ohio Cleveland_ OH 58 Female White Divorced 6100 2000 Ohio Cleveland_ OH 70 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 24 Female White Never married/single (N/A) 27000 2000 Ohio Cleveland_ OH 14 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 12 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 43 Female Other race_ nec Married_ spouse present 39000 2000 Ohio Cleveland_ OH 13 Male Other race_ nec Never married/single (N/A) 2000 Ohio Cleveland_ OH 20 Female Other race_ nec Never married/single (N/A) 1540 2000 Ohio Cleveland_ OH 54 Female White Divorced 55000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 38 Female White Married_ spouse present 29000 2000 Ohio Cleveland_ OH 42 Female American Indian or Alaska Native Never married/single (N/A) 14100 2000 Ohio Cleveland_ OH 58 Male White Married_ spouse present 152400 2000 Ohio Cleveland_ OH 26 Male White Married_ spouse present 36600 2000 Ohio Cleveland_ OH 8 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 5 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 65 Male White Married_ spouse present 20100 2000 Ohio Cleveland_ OH 56 Female White Married_ spouse present 300 2000 Ohio Cleveland_ OH 61 Female White Married_ spouse present 6400 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 75600 2000 Ohio Cleveland_ OH 41 Female White Married_ spouse present 490 2000 Ohio Cleveland_ OH 11 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 86 Male White Widowed 29500 2000 Ohio Cleveland_ OH 45 Male White Divorced 67900 2000 Ohio Cleveland_ OH 33 Male White Never married/single (N/A) 22000 2000 Ohio Cleveland_ OH 51 Female White Married_ spouse absent 9600 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 32 Female White Divorced 35500 2000 Ohio Cleveland_ OH 22 Female White Never married/single (N/A) 24000 2000 Ohio Cleveland_ OH 10 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 77 Male White Married_ spouse present 13210 2000 Ohio Cleveland_ OH 75 Female White Married_ spouse present 14920 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 700 2000 Ohio Cleveland_ OH 23 Female White Never married/single (N/A) 11000 2000 Ohio Cleveland_ OH 4 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 37 Male White Married_ spouse present 30100 2000 Ohio Cleveland_ OH 72 Female White Married_ spouse present 5300 2000 Ohio Cleveland_ OH 62 Female White Divorced 9000 2000 Ohio Cleveland_ OH 77 Male White Divorced 10780 2000 Ohio Cleveland_ OH 41 Male White Never married/single (N/A) 18000 2000 Ohio Cleveland_ OH 52 Female White Divorced 48700 2000 Ohio Cleveland_ OH 53 Male White Divorced 35000 2000 Ohio Cleveland_ OH 43 Male White Married_ spouse present 62000 2000 Ohio Cleveland_ OH 14 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 10 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 56 Male White Married_ spouse present 59300 2000 Ohio Cleveland_ OH 53 Female White Married_ spouse present 35000 2000 Ohio Cleveland_ OH 60 Male White Married_ spouse present 36004 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 25010 2000 Ohio Cleveland_ OH 50 Male White Married_ spouse present 37700 2000 Ohio Cleveland_ OH 45 Female White Married_ spouse present 33600 2000 Ohio Cleveland_ OH 18 Male White Never married/single (N/A) 8840 2000 Ohio Cleveland_ OH 11 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 24800 2000 Ohio Cleveland_ OH 35 Male White Married_ spouse present 54450 2000 Ohio Cleveland_ OH 2 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 56 Female White Divorced 54400 2000 Ohio Cleveland_ OH 93 Female White Widowed 0 2000 Ohio Cleveland_ OH 69 Male White Widowed 47990 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 131000 2000 Ohio Cleveland_ OH 53 Female White Married_ spouse present 70000 2000 Ohio Cleveland_ OH 80 Male White Married_ spouse present 43200 2000 Ohio Cleveland_ OH 68 Female White Married_ spouse present 70800 2000 Ohio Cleveland_ OH 38 Male White Never married/single (N/A) 25000 2000 Ohio Cleveland_ OH 34 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 70 Female White Never married/single (N/A) 66700 2000 Ohio Cleveland_ OH 57 Male White Married_ spouse present 60000 2000 Ohio Cleveland_ OH 47 Female White Never married/single (N/A) 22000 2000 Ohio Cleveland_ OH 67 Female White Married_ spouse absent 28900 2000 Ohio Cleveland_ OH 35 Female White Divorced 24100 2000 Ohio Cleveland_ OH 15 Male White Never married/single (N/A) 900 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 58 Female White Widowed 48900 2000 Ohio Cleveland_ OH 72 Female White Widowed 13600 2000 Ohio Cleveland_ OH 27 Female Black/Negro Never married/single (N/A) 20700 2000 Ohio Cleveland_ OH 7 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 4 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 73 Male White Widowed 18500 2000 Ohio Cleveland_ OH 65 Male Black/Negro Married_ spouse present 21800 2000 Ohio Cleveland_ OH 66 Female Black/Negro Married_ spouse present 3600 2000 Ohio Cleveland_ OH 63 Male White Married_ spouse present 9000 2000 Ohio Cleveland_ OH 60 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 79 Male White Never married/single (N/A) 13400 2000 Ohio Cleveland_ OH 83 Female White Widowed 10400 2000 Ohio Cleveland_ OH 7 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 83 Male White Widowed 18000 2000 Ohio Cleveland_ OH 62 Male Black/Negro Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 12 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 1 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 14 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 8 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 71 Female White Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 68 Female White Widowed 13570 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 32700 2000 Ohio Cleveland_ OH 50 Female White Married_ spouse present 32600 2000 Ohio Cleveland_ OH 19 Male White Never married/single (N/A) 4560 2000 Ohio Cleveland_ OH 53 Female White Never married/single (N/A) 42000 2000 Ohio Cleveland_ OH 24 Female Black/Negro Never married/single (N/A) 2600 2000 Ohio Cleveland_ OH 2 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 77 Female White Widowed 12390 2000 Ohio Cleveland_ OH 37 Female Black/Negro Never married/single (N/A) 23200 2000 Ohio Cleveland_ OH 16 Female Black/Negro Never married/single (N/A) 500 2000 Ohio Cleveland_ OH 13 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 16 Female Black/Negro Never married/single (N/A) 500 2000 Ohio Cleveland_ OH 2 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 12 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 11 Female Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 66 Male Black/Negro Divorced 10010 2000 Ohio Cleveland_ OH 41 Male Black/Negro Never married/single (N/A) 5600 2000 Ohio Cleveland_ OH 30 Female Black/Negro Never married/single (N/A) 12004 2000 Ohio Cleveland_ OH 42 Female White Divorced 41600 2000 Ohio Cleveland_ OH 65 Male White Divorced 44000 2000 Ohio Cleveland_ OH 47 Male White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 47 Female White Never married/single (N/A) 10600 2000 Ohio Cleveland_ OH 59 Female White Divorced 0 2000 Ohio Cleveland_ OH 86 Female Black/Negro Widowed 17860 2000 Ohio Cleveland_ OH 70 Male Black/Negro Married_ spouse present 20800 2000 Ohio Cleveland_ OH 69 Female Black/Negro Married_ spouse present 29900 2000 Ohio Cleveland_ OH 52 Female White Divorced 500 2000 Ohio Cleveland_ OH 49 Female Black/Negro Never married/single (N/A) 5500 2000 Ohio Cleveland_ OH 63 Female White Divorced 7600 2000 Ohio Cleveland_ OH 33 Male White Never married/single (N/A) 31900 2000 Ohio Cleveland_ OH 73 Female White Widowed 8700 2000 Ohio Cleveland_ OH 27 Male White Divorced 19000 2000 Ohio Cleveland_ OH 39 Male White Married_ spouse present 86000 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 14 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 0 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 37 Male Black/Negro Never married/single (N/A) 37060 2000 Ohio Cleveland_ OH 32 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 59 Female Two major races Divorced 30000 2000 Ohio Cleveland_ OH 57 Female White Married_ spouse present 126000 2000 Ohio Cleveland_ OH 74 Male White Married_ spouse present 29900 2000 Ohio Cleveland_ OH 71 Female White Married_ spouse present 6500 2000 Ohio Cleveland_ OH 40 Male White Never married/single (N/A) 7800 2000 Ohio Cleveland_ OH 42 Female White Separated 19500 2000 Ohio Cleveland_ OH 79 Male White Married_ spouse present 32500 2000 Ohio Cleveland_ OH 47 Male White Married_ spouse present 35000 2000 Ohio Cleveland_ OH 49 Female White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 85 Male White Widowed 17500 2000 Ohio Cleveland_ OH 40 Female White Divorced 23200 2000 Ohio Cleveland_ OH 14 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 84 Female White Widowed 7500 2000 Ohio Cleveland_ OH 53 Female White Never married/single (N/A) 32000 2000 Ohio Cleveland_ OH 26 Male Black/Negro Married_ spouse present 18200 2000 Ohio Cleveland_ OH 1 Male Black/Negro Never married/single (N/A) 2000 Ohio Cleveland_ OH 75 Male White Married_ spouse present 17000 2000 Ohio Cleveland_ OH 35 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 3 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 3 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 74 Female White Married_ spouse present 27000 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 2800 2000 Ohio Cleveland_ OH 54 Female Black/Negro Divorced 11000 2000 Ohio Cleveland_ OH 58 Male Black/Negro Divorced 10000 2000 Ohio Cleveland_ OH 29 Male White Never married/single (N/A) 30300 2000 Ohio Cleveland_ OH 51 Male White Separated 40000 2000 Ohio Cleveland_ OH 63 Female White Divorced 60000 2000 Ohio Cleveland_ OH 67 Female White Widowed 9200 2000 Ohio Cleveland_ OH 69 Female Black/Negro Widowed 8500 2000 Ohio Cleveland_ OH 40 Female White Married_ spouse present 13130 2000 Ohio Cleveland_ OH 62 Male White Married_ spouse present 45500 2000 Ohio Cleveland_ OH 61 Female White Married_ spouse present 49000 2000 Ohio Cleveland_ OH 18 Male White Never married/single (N/A) 29000 2000 Ohio Cleveland_ OH 61 Male White Married_ spouse present 20000 2000 Ohio Cleveland_ OH 38 Male White Married_ spouse present 46200 2000 Ohio Cleveland_ OH 40 Male White Married_ spouse present 112000 2000 Ohio Cleveland_ OH 34 Female White Married_ spouse present 3200 2000 Ohio Cleveland_ OH 19 Male White Never married/single (N/A) 13000 2000 Ohio Cleveland_ OH 36 Male White Married_ spouse present 53500 2000 Ohio Cleveland_ OH 60 Male White Married_ spouse present 49200 2000 Ohio Cleveland_ OH 47 Female White Married_ spouse present 25350 2000 Ohio Cleveland_ OH 26 Female White Never married/single (N/A) 29300 2000 Ohio Cleveland_ OH 51 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 34 Female White Separated 2500 2000 Ohio Cleveland_ OH 3 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 45 Male White Married_ spouse present 51910 2000 Ohio Cleveland_ OH 7 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 31 Female White Married_ spouse present 33000 2000 Ohio Cleveland_ OH 0 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 41 Female White Married_ spouse present 8000 2000 Ohio Cleveland_ OH 22 Male White Never married/single (N/A) 3200 2000 Ohio Cleveland_ OH 12 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 35 Male Other race_ nec Married_ spouse absent 13800 2000 Ohio Cleveland_ OH 11 Female Other race_ nec Never married/single (N/A) 2000 Ohio Cleveland_ OH 26 Male Other race_ nec Married_ spouse present 12000 2000 Ohio Cleveland_ OH 47 Male Other race_ nec Married_ spouse absent 12400 2000 Ohio Cleveland_ OH 62 Male White Married_ spouse present 0 2000 Ohio Cleveland_ OH 63 Female White Married_ spouse present 0 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 17 Male White Never married/single (N/A) 830 2000 Ohio Cleveland_ OH 41 Female White Divorced 12000 2000 Ohio Cleveland_ OH 15 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 26 Male White Married_ spouse present 75400 2000 Ohio Cleveland_ OH 30 Female White Married_ spouse present 5600 2000 Ohio Cleveland_ OH 51 Male White Married_ spouse present 36100 2000 Ohio Cleveland_ OH 44 Female White Married_ spouse present 30100 2000 Ohio Cleveland_ OH 22 Female White Never married/single (N/A) 18000 2000 Ohio Cleveland_ OH 18 Female White Never married/single (N/A) 6000 2000 Ohio Cleveland_ OH 21 Female White Never married/single (N/A) 6500 2000 Ohio Cleveland_ OH 74 Male White Married_ spouse present 23100 2000 Ohio Cleveland_ OH 27 Male White Divorced 60000 2000 Ohio Cleveland_ OH 8 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 47 Male White Married_ spouse present 33600 2000 Ohio Cleveland_ OH 72 Female White Widowed 21000 2000 Ohio Cleveland_ OH 61 Male White Married_ spouse present 143230 2000 Ohio Cleveland_ OH 16 Female White Never married/single (N/A) 0 2000 Ohio Cleveland_ OH 13 Male White Never married/single (N/A) 2000 Ohio Cleveland_ OH 43 Male White Married_ spouse present 116390 2000 Ohio Cleveland_ OH 39 Female White Married_ spouse present 23000 2000 Ohio Cleveland_ OH 10 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 6 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 62 Male White Widowed 420 2000 Ohio Cleveland_ OH 53 Male White Married_ spouse present 70000 2000 Ohio Cleveland_ OH 52 Female White Married_ spouse present 30000 2000 Ohio Cleveland_ OH 50 Male White Divorced 16500 2000 Ohio Cleveland_ OH 59 Female White Divorced 27200 2000 Ohio Cleveland_ OH 40 Male White Married_ spouse present 23700 2000 Ohio Cleveland_ OH 40 Female White Married_ spouse present 19800 2000 Ohio Cleveland_ OH 9 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 32 Female White Married_ spouse present 25700 2000 Ohio Cleveland_ OH 4 Female White Never married/single (N/A) 2000 Ohio Cleveland_ OH 34 Male White Never married/single (N/A) 36000 2000 Ohio Cleveland_ OH 73 Female White Widowed 25820 2000 Ohio Cleveland_ OH 59 Male White Married_ spouse present 34000 2000 Ohio Cleveland_ OH 60 Female White Married_ spouse present 21000 \ No newline at end of file diff --git a/previous_versions/v0.4.0/data/dem_score.csv b/previous_versions/v0.4.0/data/dem_score.csv new file mode 100755 index 000000000..c48fc1f49 --- /dev/null +++ b/previous_versions/v0.4.0/data/dem_score.csv @@ -0,0 +1,97 @@ +country,1952,1957,1962,1967,1972,1977,1982,1987,1992 +Albania,-9,-9,-9,-9,-9,-9,-9,-9,5 +Argentina,-9,-1,-1,-9,-9,-9,-8,8,7 +Armenia,-9,-7,-7,-7,-7,-7,-7,-7,7 +Australia,10,10,10,10,10,10,10,10,10 +Austria,10,10,10,10,10,10,10,10,10 +Azerbaijan,-9,-7,-7,-7,-7,-7,-7,-7,1 +Belarus,-9,-7,-7,-7,-7,-7,-7,-7,7 +Belgium,10,10,10,10,10,10,10,10,10 +Bhutan,-10,-10,-10,-10,-10,-10,-10,-10,-10 +Bolivia,-4,-3,-3,-4,-7,-7,8,9,9 +Brazil,5,5,5,-9,-9,-4,-3,7,8 +Bulgaria,-7,-7,-7,-7,-7,-7,-7,-7,8 +Canada,10,10,10,10,10,10,10,10,10 +Chile,2,5,5,6,6,-7,-7,-6,8 +China,-8,-8,-8,-9,-8,-7,-7,-7,-7 +Colombia,-5,7,7,7,7,8,8,8,9 +Costa Rica,10,10,10,10,10,10,10,10,10 +Croatia,-7,-7,-7,-7,-7,-7,-5,-5,-3 +Cuba,0,-9,-7,-7,-7,-7,-7,-7,-7 +Czech Rep.,-7,-7,-7,-7,-7,-7,-7,-7,8 +Denmark,10,10,10,10,10,10,10,10,10 +Dominican Rep.,-9,-9,8,-3,-3,-3,6,6,6 +Ecuador,2,2,-1,-1,-5,-5,9,8,9 +Egypt,-7,-7,-7,-7,-7,-6,-6,-6,-6 +El Salvador,-6,-5,-3,0,-1,-6,2,6,7 +Estonia,-9,-7,-7,-7,-7,-7,-7,-7,6 +Ethiopia,-9,-9,-9,-9,-9,-7,-7,-8,0 +Finland,10,10,10,10,10,10,10,10,10 +France,10,10,5,5,8,8,8,9,9 +Georgia,-9,-7,-7,-7,-7,-7,-7,-7,4 +Germany,10,10,10,10,10,10,10,10,10 +Greece,4,4,4,-7,-7,8,8,10,10 +Guatemala,2,-6,-5,3,1,-3,-7,3,3 +Haiti,-5,-5,-9,-9,-10,-9,-9,-8,-7 +Honduras,-3,-1,-1,-1,-1,-1,6,5,6 +Hungary,-7,-7,-7,-7,-7,-7,-7,-7,10 +India,9,9,9,9,9,8,8,8,8 +Indonesia,0,-1,-5,-7,-7,-7,-7,-7,-7 +Iran,-1,-10,-10,-10,-10,-10,-6,-6,-6 +Iraq,-4,-4,-5,-5,-7,-7,-9,-9,-9 +Ireland,10,10,10,10,10,10,10,10,10 +Israel,10,10,10,9,9,9,9,9,9 +Italy,10,10,10,10,10,10,10,10,10 +Japan,10,10,10,10,10,10,10,10,10 +Jordan,-1,-9,-9,-9,-9,-10,-10,-9,-2 +Kazakhstan,-9,-7,-7,-7,-7,-7,-7,-7,-3 +"Korea, Dem. Rep.",-7,-8,-8,-9,-9,-9,-9,-9,-9 +"Korea, Rep.",-4,-4,-7,3,-9,-8,-5,1,6 +Kyrgyzstan,-9,-7,-7,-7,-7,-7,-7,-7,-3 +Latvia,-9,-7,-7,-7,-7,-7,-7,-7,8 +Lebanon,2,2,2,2,5,0,0,0,0 +Liberia,-6,-6,-6,-6,-6,-6,-7,-6,0 +Libya,-7,-7,-7,-7,-7,-7,-7,-7,-7 +Lithuania,-9,-7,-7,-7,-7,-7,-7,-7,10 +"Macedonia, FYR",-7,-7,-7,-7,-7,-7,-5,-5,6 +Mexico,-6,-6,-6,-6,-6,-3,-3,-3,0 +Moldova,-9,-7,-7,-7,-7,-7,-7,-7,5 +Mongolia,-7,-7,-7,-7,-7,-7,-7,-7,9 +Montenegro,-7,-7,-7,-7,-7,-7,-5,-5,-5 +Myanmar,8,8,-6,-7,-7,-6,-8,-8,-7 +Nepal,-7,-4,-9,-9,-9,-9,-2,-2,5 +Netherlands,10,10,10,10,10,10,10,10,10 +New Zealand,10,10,10,10,10,10,10,10,10 +Nicaragua,-8,-8,-8,-8,-8,-8,-5,-1,6 +Norway,10,10,10,10,10,10,10,10,10 +Oman,-6,-10,-10,-10,-10,-10,-10,-10,-9 +Pakistan,5,8,1,1,4,-7,-7,-4,8 +Panama,-1,4,4,4,-7,-7,-5,-8,8 +Paraguay,-5,-9,-9,-8,-8,-8,-8,-8,7 +Peru,-2,5,-6,5,-7,-7,7,7,-3 +Philippines,5,5,5,5,-9,-9,-7,8,8 +Poland,-7,-7,-7,-7,-7,-7,-8,-6,8 +Portugal,-9,-9,-9,-9,-9,9,10,10,10 +Romania,-7,-7,-7,-7,-7,-8,-8,-8,5 +Russia,-9,-7,-7,-7,-7,-7,-7,-7,5 +Saudi Arabia,-10,-10,-10,-10,-10,-10,-10,-10,-10 +Serbia,-7,-7,-7,-7,-7,-7,-5,-5,-5 +Slovak Republic,-7,-7,-7,-7,-7,-7,-7,-7,8 +Slovenia,-7,-7,-7,-7,-7,-7,-5,-5,10 +South Africa,4,4,4,4,4,4,4,4,6 +Spain,-7,-7,-7,-7,-7,5,10,10,10 +Sri Lanka,7,7,7,7,8,8,5,5,5 +Sweden,10,10,10,10,10,10,10,10,10 +Switzerland,10,10,10,10,10,10,10,10,10 +Syria,-7,7,-2,-7,-9,-9,-9,-9,-9 +Taiwan,-8,-8,-8,-8,-8,-7,-7,-1,7 +Tajikistan,-9,-7,-7,-7,-7,-7,-7,-7,-6 +Thailand,-6,-3,-7,-7,-7,-2,2,2,9 +Turkey,7,4,9,8,-2,9,-5,7,9 +Turkmenistan,-9,-7,-7,-7,-7,-7,-7,-7,-9 +Ukraine,-9,-7,-7,-7,-7,-7,-7,-7,6 +United Kingdom,10,10,10,10,10,10,10,10,10 +United States,10,10,10,10,10,10,10,10,10 +Uruguay,8,8,8,8,-3,-8,-7,9,10 +Uzbekistan,-9,-7,-7,-7,-7,-7,-7,-7,-9 +Venezuela,-3,-3,6,6,9,9,9,9,8 diff --git a/previous_versions/v0.4.0/data/dem_score.xlsx b/previous_versions/v0.4.0/data/dem_score.xlsx new file mode 100755 index 000000000..85d90daa9 Binary files /dev/null and b/previous_versions/v0.4.0/data/dem_score.xlsx differ diff --git a/previous_versions/v0.4.0/data/ideology.csv b/previous_versions/v0.4.0/data/ideology.csv new file mode 100755 index 000000000..302957298 --- /dev/null +++ b/previous_versions/v0.4.0/data/ideology.csv @@ -0,0 +1,76 @@ +city,state,state_ideology +New York,New York,Liberal +Chicago,Illinois,Liberal +Los Angeles,California,Liberal +Washington,DC,Liberal +Houston,Texas,Conservative +Philadelphia,Pennsylvania,Conservative +Phoenix,Arizona,Conservative +San Diego,California,Liberal +Dallas,Texas,Conservative +Detroit,Michigan,Conservative +San Francisco,California,Liberal +San Antonio,Texas,Conservative +Atlanta,Georgia,Conservative +Las Vegas,Nevada,Liberal +Baltimore,Maryland,Liberal +Boston,Massachusetts,Liberal +"Jacksonville, Fla.",Florida,Conservative +"El Paso, Texas",Texas,Conservative +"Columbus, Ohio",Ohio,Conservative +Cleveland,Ohio,Conservative +"Tucson, Ariz.",Arizona,Conservative +"Newark, N.J.",New Jersey,Liberal +"Austin, Texas",Texas,Conservative +"Memphis, Tenn.",Tennessee,Conservative +Milwaukee,Wisconsin,Conservative +"San Jose, Calif.",California,Liberal +Miami,Florida,Conservative +Denver,Colorado,Liberal +"Sacramento, Calif.",California,Liberal +"Charlotte, N.C.",North Carolina,Conservative +"Tampa, Fla.",Florida,Conservative +Indianapolis,Indiana,Conservative +"Santa Ana, Calif.",California,Liberal +New Orleans,Louisiana,Conservative +"Oakland, Calif.",California,Liberal +"Orlando, Fla.",Florida,Conservative +"Oklahoma City, Okla.",Oklahoma,Conservative +Seattle,Washington,Liberal +"Kansas City, Mo.",Missouri,Conservative +"Nashville, Tenn.",Tennessee,Conservative +"Laredo, Texas",Texas,Conservative +"Fort Worth, Texas",Texas,Conservative +"Louisville, Ky.",Kentucky,Conservative +"Norfolk, Va.",Virginia,Liberal +"Arlington, Va.",Virginia,Liberal +Pittsburgh,Pennsylvania,Conservative +"Albuquerque, N.M.",New Mexico,Liberal +"Jersey City, N.J.",New Jersey,Liberal +"Raleigh, N.C.",North Carolina,Conservative +"Rochester, N.Y.",New York,Liberal +Cincinnati,Ohio,Conservative +"Long Beach, Calif.",California,Liberal +"Birmingham, Ala.",Alabama,Conservative +"Wichita, Kan.",Kansas,Conservative +"Virginia Beach, Va.",Virginia,Liberal +"Fresno, Calif.",California,Liberal +"Buffalo, N.Y.",New York,Liberal +Minneapolis,Minneapolis,Liberal +"Portland, Ore.",Oregon,Liberal +"Reno, Nev.",Nevada,Liberal +"Richmond, Va.",Virginia,Liberal +"Baton Rouge, La.",Louisiana,Conservative +"Jackson, Miss.",Mississippi,Conservative +"Riverside, Calif.",California,Liberal +"Fort Lauderdale, Fla.",Florida,Conservative +St. Louis,Missouri,Conservative +"Brownsville, Texas",Texas,Conservative +"Albany, N.Y.",New York,Liberal +"Colorado Springs, Colo.",Colorado,Liberal +"Savannah, Ga.",Georgia,Conservative +"Winston-Salem, N.C.",North Carolina,Conservative +"Toledo, Ohio",Ohio,Conservative +"Madison, Wis.",Wisconsin,Conservative +"Corpus Christi, Texas",Texas,Conservative +"San Bernardino, Calif.",California,Liberal \ No newline at end of file diff --git a/previous_versions/v0.4.0/data/le_mess.csv b/previous_versions/v0.4.0/data/le_mess.csv new file mode 100755 index 000000000..7cc6fb6fc --- /dev/null +++ b/previous_versions/v0.4.0/data/le_mess.csv @@ -0,0 +1,203 @@ +country,1951,1952,1953,1954,1955,1956,1957,1958,1959,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994,1995,1996,1997,1998,1999,2000,2001,2002,2003,2004,2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016 +Afghanistan,27.13,27.67,28.19,28.73,29.27,29.8,30.34,30.86,31.4,31.94,32.47,33.01,33.53,34.07,34.6,35.13,35.66,36.17,36.69,37.2,37.7,38.19,38.67,39.14,39.61,40.07,40.53,40.98,41.46,41.96,42.51,43.11,43.75,44.45,45.21,46.02,46.87,47.74,48.62,49.5,49.3,49.4,49.5,48.9,49.4,49.7,49.5,48.6,50.0,50.1,50.4,51.0,51.4,51.8,52.0,52.1,52.4,52.8,53.3,53.6,54.0,54.4,54.8,54.9,53.8,52.72 +Albania,54.72,55.23,55.85,56.59,57.45,58.42,59.48,60.6,61.75,62.87,63.92,64.84,65.6,66.18,66.59,66.88,67.11,67.32,67.55,67.83,68.16,68.53,68.93,69.35,69.77,70.17,70.54,70.86,71.14,71.39,71.63,71.88,72.15,72.42,72.71,72.96,73.14,73.25,73.3,73.3,73.4,73.6,73.6,73.6,73.7,73.8,74.1,74.2,74.2,74.7,75.1,75.5,75.7,75.9,76.2,76.4,76.6,76.8,77.0,77.2,77.4,77.5,77.7,77.9,78.0,78.1 +Algeria,43.03,43.5,43.96,44.44,44.93,45.44,45.94,46.45,46.97,47.5,48.02,48.55,49.07,49.58,50.09,50.58,51.05,51.49,51.95,52.41,52.88,53.38,53.91,54.52,55.24,56.11,57.13,58.28,59.56,60.92,62.31,63.69,64.97,66.15,67.18,68.04,68.75,69.33,69.81,70.2,70.5,70.9,71.2,71.4,71.6,72.1,72.4,72.6,73.0,73.3,73.5,73.8,73.9,74.4,74.8,75.0,75.3,75.5,75.7,76.0,76.1,76.2,76.3,76.3,76.4,76.5 +Angola,31.05,31.59,32.14,32.69,33.24,33.78,34.33,34.88,35.43,35.98,36.53,37.08,37.63,38.18,38.74,39.28,39.84,40.39,40.95,41.5,42.06,42.62,43.17,43.71,44.22,44.68,45.12,45.5,45.84,46.14,46.42,46.69,46.96,47.23,47.5,47.75,47.99,48.2,48.4,48.6,49.3,49.6,48.4,50.0,50.9,51.3,51.7,51.8,51.8,52.3,52.5,53.3,53.9,54.5,55.2,55.7,56.2,56.7,57.1,57.6,58.1,58.5,58.8,59.2,59.6,60.0 +Antigua and Barbuda,58.26,58.8,59.34,59.87,60.41,60.93,61.45,61.97,62.48,62.97,63.46,63.93,64.38,64.81,65.23,65.63,66.03,66.41,66.81,67.19,67.56,67.94,68.3,68.64,68.99,69.32,69.64,69.96,70.28,70.59,70.9,71.22,71.52,71.82,72.13,72.42,72.7,72.97,73.24,73.5,73.6,73.5,73.4,73.4,73.5,73.5,73.9,74.1,74.0,73.8,74.1,74.3,74.5,74.6,74.9,74.9,75.3,75.5,75.7,75.8,75.9,76.1,76.2,76.3,76.4,76.5 +Argentina,61.93,62.54,63.1,63.59,64.03,64.41,64.73,65.0,65.22,65.39,65.53,65.64,65.74,65.84,65.95,66.08,66.26,66.47,66.72,67.01,67.32,67.64,67.96,68.28,68.6,68.92,69.24,69.57,69.89,70.2,70.51,70.78,71.04,71.26,71.46,71.66,71.84,72.05,72.26,72.5,72.7,72.8,73.1,73.4,73.5,73.5,73.6,73.8,73.9,74.2,74.3,74.3,74.5,75.0,75.3,75.3,75.2,75.4,75.6,75.8,76.0,76.1,76.2,76.3,76.5,76.7 +Armenia,62.67,63.13,63.6,64.07,64.54,65.0,65.45,65.92,66.39,66.86,67.33,67.82,68.3,68.78,69.26,69.74,70.22,70.67,71.1,71.47,71.79,72.02,72.19,72.28,72.33,72.38,72.44,72.53,72.63,72.72,72.73,72.64,72.43,72.1,71.7,71.24,70.82,70.46,70.22,70.1,69.7,68.8,68.3,68.6,69.1,69.4,70.0,70.5,70.8,71.3,71.4,71.6,71.5,71.8,71.8,71.7,72.3,72.3,72.6,73.0,73.5,73.9,74.3,74.5,74.7,74.9 +Aruba,58.96,60.01,60.98,61.87,62.69,63.42,64.09,64.68,65.2,65.66,66.07,66.44,66.79,67.11,67.44,67.76,68.1,68.44,68.78,69.14,69.5,69.85,70.19,70.52,70.83,71.14,71.44,71.74,72.02,72.29,72.54,72.75,72.93,73.07,73.18,73.26,73.33,73.38,73.43,73.47,73.51,73.54,73.57,73.6,73.62,73.65,73.67,73.7,73.73,73.78,73.85,73.94,74.05,74.18,74.32,74.47,74.62,74.77,74.92,75.06,75.19,75.32,75.46,75.59,75.72,75.85 +Australia,68.71,69.11,69.69,69.84,70.16,70.03,70.31,70.86,70.43,70.87,71.14,70.91,70.97,70.63,70.96,70.79,71.07,70.7,71.11,70.78,71.38,71.9,72.11,71.86,72.81,72.84,73.45,73.84,74.4,74.56,74.92,74.7,75.51,75.98,75.41,76.08,76.27,76.3,76.4,77.0,77.4,77.6,77.9,78.1,78.3,78.5,78.8,79.2,79.4,79.8,80.1,80.3,80.6,80.9,81.2,81.4,81.5,81.6,81.8,82.0,82.2,82.4,82.4,82.3,82.3,82.3 +Austria,65.24,66.78,67.27,67.3,67.58,67.7,67.46,68.46,68.39,68.75,69.72,69.51,69.64,70.13,69.92,70.22,70.1,70.25,70.02,70.07,70.27,70.59,71.16,71.15,71.28,71.77,72.12,72.2,72.51,72.64,72.96,73.12,73.19,73.73,73.95,74.43,74.86,75.34,75.43,75.7,75.8,76.0,76.2,76.5,76.8,77.1,77.6,77.8,78.0,78.2,78.6,78.8,79.0,79.4,79.5,80.0,80.1,80.4,80.3,80.5,80.7,80.9,81.1,81.2,81.3,81.4 +Azerbaijan,57.5,57.93,58.36,58.79,59.21,59.63,60.05,60.48,60.9,61.33,61.76,62.2,62.62,63.06,63.49,63.91,64.35,64.75,65.14,65.48,65.75,65.93,66.04,66.05,66.02,65.92,65.8,65.68,65.6,65.55,65.61,65.73,65.92,66.15,66.37,66.48,66.46,66.28,65.98,65.6,65.3,63.7,64.0,63.5,64.6,65.0,65.3,65.6,65.9,66.5,67.2,67.6,67.6,67.8,68.2,68.7,69.1,69.2,69.7,70.1,70.8,71.5,72.1,72.5,72.9,73.3 +Bahamas,58.91,59.29,59.67,60.03,60.39,60.72,61.06,61.38,61.69,62.0,62.29,62.58,62.85,63.13,63.4,63.65,63.91,64.14,64.39,64.61,64.85,65.08,65.3,65.53,65.74,65.96,66.16,66.37,66.57,66.75,66.95,67.12,67.31,67.5,67.67,67.86,68.02,68.2,68.35,68.5,68.9,69.2,69.7,69.5,69.7,70.0,70.2,70.1,70.1,70.2,70.3,70.4,71.1,71.7,71.7,72.0,71.8,72.2,72.7,72.7,72.6,72.7,72.9,73.5,73.7,73.9 +Bahrain,41.45,42.32,43.26,44.27,45.35,46.49,47.7,48.97,50.29,51.64,52.99,54.33,55.64,56.9,58.1,59.23,60.29,61.29,62.22,63.1,63.92,64.67,65.38,66.03,66.63,67.2,67.72,68.21,68.67,69.09,69.47,69.83,70.16,70.46,70.73,70.98,71.2,71.41,71.61,71.8,72.0,72.1,72.5,72.9,73.0,73.4,73.8,74.0,74.2,73.7,74.3,74.8,75.3,75.7,76.1,76.3,77.0,77.6,78.2,78.7,78.8,79.0,79.1,79.1,79.1,79.1 +Bangladesh,42.58,42.87,43.19,43.54,43.91,44.3,44.73,45.19,45.68,46.2,46.73,47.28,47.81,48.29,48.6,48.63,48.37,47.83,47.09,46.31,45.74,45.52,45.77,46.49,47.58,48.92,50.27,51.47,52.44,53.18,53.72,54.15,54.57,55.0,55.47,55.96,56.46,56.94,57.42,57.9,56.4,59.7,60.5,61.2,61.6,62.4,63.2,63.9,64.6,64.9,65.4,65.8,66.3,66.8,67.1,67.5,67.7,68.3,68.6,68.8,69.3,69.4,69.8,70.1,70.4,70.7 +Barbados,56.82,57.41,57.99,58.56,59.13,59.67,60.22,60.76,61.28,61.8,62.31,62.79,63.27,63.74,64.2,64.64,65.08,65.5,65.91,66.31,66.71,67.09,67.47,67.83,68.17,68.53,68.87,69.22,69.57,69.91,70.25,70.58,70.91,71.23,71.54,71.85,72.14,72.43,72.72,73.0,73.2,73.2,73.1,73.0,73.3,73.7,73.9,74.1,74.2,74.0,74.4,74.6,74.8,74.9,75.0,75.0,75.1,75.3,75.3,75.2,75.2,75.4,75.5,75.6,75.7,75.8 +Belarus,65.11,65.54,65.96,66.37,66.77,67.16,67.52,67.88,68.82,71.59,72.3,71.01,71.66,73.17,72.7,73.05,72.78,72.88,72.47,71.94,72.56,72.26,72.29,72.57,71.63,71.46,71.39,71.23,70.82,70.57,70.84,70.95,70.73,70.09,70.28,71.66,71.55,71.28,71.05,70.5,70.1,69.6,68.9,68.6,68.2,68.1,68.0,67.9,67.7,68.1,68.0,67.9,68.2,68.5,68.7,69.1,69.7,70.0,70.1,70.2,70.3,70.4,70.6,70.7,71.0,71.3 +Belgium,66.77,67.97,68.33,68.59,68.54,68.83,69.19,69.88,70.28,69.59,70.46,70.19,70.0,70.66,70.51,70.58,70.86,70.55,70.63,70.89,71.01,71.35,71.56,71.91,71.9,72.05,72.7,72.64,73.13,73.18,73.59,73.81,73.81,74.31,74.41,74.61,75.22,75.53,75.59,76.0,76.2,76.3,76.5,76.6,76.9,77.2,77.4,77.5,77.7,77.8,78.0,78.2,78.5,79.0,79.1,79.5,79.5,79.6,79.8,80.1,80.2,80.3,80.4,80.5,80.5,80.5 +Belize,55.15,55.7,56.27,56.82,57.37,57.91,58.46,58.99,59.54,60.08,60.64,61.2,61.78,62.36,62.95,63.53,64.11,64.67,65.21,65.72,66.21,66.66,67.11,67.52,67.93,68.32,68.7,69.06,69.43,69.78,70.13,70.47,70.8,71.09,71.34,71.51,71.6,71.61,71.54,71.4,71.2,71.1,70.8,70.6,70.5,70.4,69.7,69.5,69.3,69.0,68.8,69.3,69.6,69.9,70.0,70.3,70.6,70.7,70.9,71.2,71.2,71.3,71.3,71.5,71.7,71.9 +Benin,33.53,34.09,34.64,35.19,35.72,36.25,36.77,37.28,37.79,38.29,38.8,39.32,39.85,40.38,40.93,41.5,42.09,42.69,43.31,43.93,44.55,45.16,45.77,46.36,46.93,47.46,47.96,48.43,48.88,49.34,49.84,50.38,50.97,51.62,52.33,53.09,53.89,54.67,55.42,56.1,56.3,56.6,56.9,56.8,56.7,56.6,56.9,57.0,57.1,57.2,57.4,57.7,57.9,58.2,58.6,58.9,59.2,59.7,60.4,60.8,61.1,61.4,61.7,62.0,62.3,62.6 +Bhutan,30.94,31.47,32.01,32.56,33.12,33.68,34.25,34.81,35.38,35.94,36.49,37.04,37.57,38.12,38.68,39.28,39.94,40.66,41.45,42.31,43.23,44.2,45.18,46.2,47.21,48.22,49.22,50.21,51.18,52.12,53.05,53.96,54.87,55.78,56.69,57.61,58.54,59.48,60.44,61.4,61.9,62.4,62.8,63.1,63.8,64.7,65.1,65.6,66.5,65.9,67.5,68.1,68.5,68.9,69.3,69.8,70.3,70.7,70.9,71.4,71.7,71.9,72.2,72.4,72.7,73.0 +Bolivia,40.6,40.94,41.28,41.64,41.98,42.34,42.7,43.05,43.41,43.77,44.14,44.5,44.88,45.24,45.62,45.99,46.34,46.69,47.05,47.44,47.86,48.34,48.89,49.5,50.19,50.93,51.73,52.54,53.38,54.21,55.04,55.87,56.67,57.47,58.22,58.96,59.65,60.33,60.98,61.6,62.2,62.7,63.2,63.8,64.4,65.1,65.6,66.3,66.9,67.6,68.3,68.7,69.3,69.8,70.2,70.6,70.9,71.2,71.6,71.8,72.1,72.4,72.7,72.9,73.2,73.5 +Bosnia and Herzegovina,53.22,54.49,55.7,56.85,57.94,58.97,59.95,60.87,61.74,62.56,63.34,64.07,64.78,65.46,66.14,66.81,67.47,68.14,68.82,69.49,70.17,70.84,71.49,72.12,72.71,73.24,73.71,74.12,74.48,74.82,75.2,75.65,76.15,76.63,76.95,76.89,76.37,75.39,74.07,72.7,72.7,68.0,68.3,71.1,67.0,73.8,74.4,74.8,75.3,75.7,76.2,76.4,76.7,76.9,77.0,77.1,77.3,77.5,77.7,77.9,78.2,78.4,78.6,78.7,78.9,79.1 +Botswana,46.87,47.27,47.66,48.05,48.45,48.84,49.23,49.61,49.99,50.34,50.7,51.02,51.35,51.67,52.0,52.36,52.77,53.23,53.73,54.3,54.9,55.54,56.18,56.82,57.45,58.07,58.65,59.21,59.74,60.24,60.73,61.21,61.67,62.08,62.44,62.7,62.85,62.85,62.69,62.3,62.0,61.2,60.1,58.6,56.8,54.8,52.9,50.9,49.2,47.6,46.5,45.6,45.7,46.9,49.3,51.2,52.4,53.2,54.3,55.6,56.5,56.5,56.9,57.3,58.7,60.13 +Brazil,50.59,51.1,51.62,52.14,52.66,53.19,53.71,54.23,54.75,55.27,55.78,56.27,56.75,57.21,57.66,58.07,58.49,58.91,59.31,59.73,60.14,60.56,60.98,61.41,61.84,62.27,62.68,63.07,63.45,63.81,64.18,64.55,64.94,65.34,65.76,66.18,66.6,67.04,67.47,67.9,68.1,68.3,68.5,68.8,69.0,69.3,69.6,69.9,70.3,70.7,71.1,71.4,71.7,72.0,72.4,72.7,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.3,74.4,74.5 +Brunei,56.99,57.6,58.22,58.83,59.45,60.07,60.7,61.31,61.93,62.52,63.11,63.67,64.21,64.72,65.21,65.67,66.12,66.54,66.97,67.38,67.79,68.19,68.58,68.95,69.32,69.67,70.01,70.33,70.65,70.95,71.25,71.54,71.84,72.12,72.41,72.69,72.98,73.26,73.54,73.8,73.8,74.0,74.2,74.4,74.7,74.9,75.2,75.6,75.8,75.9,76.1,76.3,76.5,76.7,76.7,76.8,76.8,76.9,77.0,77.1,76.9,76.9,76.9,77.1,77.1,77.1 +Bulgaria,60.65,59.62,64.16,64.43,64.84,65.24,66.64,68.74,66.6,69.22,70.26,69.55,70.38,71.18,71.35,71.28,70.47,71.3,70.48,71.32,70.93,70.96,71.4,71.26,71.11,71.44,70.88,71.24,71.34,71.17,71.56,71.16,71.33,71.43,71.15,71.63,71.42,71.49,71.55,71.4,71.3,71.2,71.1,70.9,71.0,70.9,70.6,71.0,71.4,71.6,71.8,72.1,72.3,72.5,72.6,72.7,72.9,73.2,73.5,73.7,74.2,74.5,74.6,74.7,74.8,74.9 +Burkina Faso,30.65,31.18,31.69,32.21,32.72,33.21,33.71,34.21,34.71,35.21,35.72,36.23,36.75,37.27,37.8,38.3,38.8,39.3,39.78,40.27,40.75,41.25,41.78,42.36,43.0,43.74,44.61,45.56,46.58,47.61,48.58,49.45,50.17,50.71,51.08,51.28,51.38,51.42,51.42,51.4,51.4,51.3,51.3,51.3,51.3,51.5,51.6,51.8,52.2,52.6,53.2,53.8,54.5,55.1,55.9,56.6,57.4,58.0,58.5,59.0,59.5,59.9,60.3,60.6,60.9,61.2 +Burundi,38.19,38.45,38.72,38.98,39.25,39.51,39.77,40.04,40.3,40.58,40.85,41.13,41.41,41.69,41.95,42.18,42.37,42.52,42.65,42.76,42.91,43.1,43.35,43.66,44.02,44.43,44.84,45.24,45.6,45.93,46.22,46.49,46.75,46.95,47.08,47.05,46.88,46.54,46.1,45.6,45.4,45.3,45.1,45.0,44.5,44.3,45.0,45.5,46.3,46.7,48.4,49.8,51.3,53.0,54.7,56.4,57.9,59.1,60.0,60.4,60.8,61.1,61.3,61.4,61.4,61.4 +Cambodia,40.5,40.81,41.08,41.32,41.52,41.7,41.86,41.99,42.14,42.29,42.47,42.7,42.95,43.2,43.45,43.73,44.0,44.13,44.03,43.28,41.67,39.73,37.58,34.94,21.69,19.04,18.1,19.55,21.91,28.16,38.0,44.24,49.43,53.22,55.5,56.49,56.82,56.99,57.22,57.6,57.9,58.2,58.1,58.0,58.1,58.3,58.7,59.0,59.5,60.0,60.8,61.6,62.4,63.2,64.0,64.8,65.4,66.1,66.6,67.0,67.6,68.2,68.7,69.1,69.4,69.7 +Cameroon,39.08,39.51,39.94,40.41,40.87,41.37,41.88,42.39,42.93,43.46,44.0,44.53,45.07,45.59,46.13,46.67,47.22,47.79,48.37,48.97,49.59,50.22,50.85,51.49,52.13,52.74,53.36,53.95,54.52,55.06,55.56,56.03,56.45,56.83,57.17,57.48,57.75,58.01,58.22,58.4,58.2,57.9,57.4,57.0,56.5,56.2,55.5,55.0,54.7,54.3,54.2,54.2,54.3,54.4,54.9,55.4,55.7,56.6,57.3,57.8,58.1,58.5,59.0,59.1,59.4,59.7 +Canada,68.53,68.72,69.1,69.96,70.02,70.0,69.92,70.58,70.62,71.0,71.22,71.25,71.26,71.64,71.74,71.86,72.07,72.23,72.39,72.58,72.91,72.81,73.04,73.12,73.41,73.84,74.13,74.46,74.81,75.05,75.46,75.67,76.04,76.33,76.31,76.46,76.76,76.82,77.09,77.4,77.6,77.7,77.8,77.9,78.0,78.3,78.6,78.8,79.0,79.2,79.5,79.6,79.8,80.1,80.2,80.5,80.6,80.8,81.1,81.3,81.6,81.6,81.6,81.7,81.7,81.7 +Cape Verde,48.45,48.63,48.81,49.0,49.19,49.38,49.57,49.76,49.95,50.12,50.27,50.43,50.59,50.77,51.0,51.32,51.75,52.32,53.0,53.78,54.65,55.57,56.5,57.41,58.3,59.16,60.0,60.82,61.62,62.41,63.19,63.95,64.69,65.43,66.12,66.75,67.33,67.85,68.3,68.7,68.6,68.6,68.4,68.3,68.3,68.2,68.2,68.2,68.2,68.4,68.6,68.7,68.9,69.1,69.3,69.6,69.6,70.4,70.7,71.1,71.4,71.9,72.3,72.7,72.9,73.1 +Central African Republic,33.34,33.79,34.26,34.72,35.18,35.62,36.07,36.53,36.97,37.43,37.89,38.36,38.85,39.36,39.92,40.5,41.15,41.84,42.57,43.36,44.19,45.04,45.91,46.77,47.6,48.36,49.07,49.7,50.21,50.61,50.86,50.96,50.95,50.81,50.57,50.21,49.8,49.34,48.86,48.4,48.1,48.0,47.5,47.2,46.7,46.3,45.9,45.7,45.5,45.3,45.2,45.2,45.2,45.4,45.5,45.8,46.2,46.8,47.6,47.9,48.1,48.5,47.8,48.2,49.6,51.04 +Chad,37.29,37.69,38.09,38.49,38.9,39.31,39.72,40.14,40.54,40.95,41.35,41.76,42.17,42.58,43.01,43.48,43.98,44.54,45.12,45.72,46.33,46.91,47.47,47.98,48.45,48.89,49.31,49.72,50.14,50.56,50.97,51.38,51.78,52.15,52.51,52.81,53.09,53.33,53.52,53.7,54.3,53.9,54.0,53.6,53.6,53.0,52.5,52.1,51.7,51.5,51.7,51.9,52.1,52.6,53.0,53.1,54.0,54.3,55.2,55.8,56.1,56.3,56.6,56.8,57.4,58.01 +Channel Islands,68.71,69.09,69.43,69.72,69.97,70.19,70.37,70.52,70.64,70.74,70.83,70.93,71.03,71.14,71.27,71.39,71.51,71.62,71.73,71.82,71.92,72.02,72.13,72.26,72.41,72.58,72.77,72.98,73.21,73.44,73.67,73.89,74.1,74.3,74.49,74.68,74.87,75.07,75.29,75.51,75.73,75.94,76.14,76.34,76.53,76.72,76.92,77.14,77.37,77.61,77.87,78.14,78.41,78.67,78.93,79.16,79.38,79.57,79.75,79.9,80.05,80.19,80.32,80.47,80.61,80.75 +Chile,54.35,54.56,54.79,55.03,55.29,55.57,55.86,56.16,56.5,56.85,57.23,57.63,58.07,58.54,59.03,59.54,60.07,60.61,61.17,61.74,62.34,62.98,63.63,64.31,65.02,65.75,66.5,67.25,67.99,68.7,69.36,69.97,70.51,71.0,71.42,71.8,72.14,72.47,72.79,73.1,74.1,75.0,75.2,75.3,75.4,75.7,76.2,76.6,76.9,77.3,77.4,77.7,77.8,78.0,78.2,78.2,78.3,78.5,78.5,78.5,78.9,79.1,79.1,79.2,79.4,79.6 +China,41.98,42.91,43.85,45.7,47.2,49.57,49.62,49.17,37.36,30.53,32.95,43.29,50.64,52.0,54.28,55.37,56.9,57.87,59.38,61.0,62.04,61.36,60.97,60.63,60.78,60.46,61.94,62.15,62.95,63.92,64.2,65.28,65.49,65.68,65.87,66.05,66.23,66.39,66.56,66.7,67.0,67.2,67.5,67.9,68.4,68.8,69.1,69.4,69.6,69.8,70.0,70.2,70.9,71.4,71.9,72.6,73.1,73.4,73.9,74.3,74.9,75.3,75.7,75.9,76.2,76.5 +Colombia,49.7,50.93,52.08,53.16,54.15,55.07,55.91,56.69,57.39,58.03,58.63,59.18,59.71,60.21,60.7,61.16,61.6,62.03,62.43,62.83,63.23,63.64,64.08,64.53,65.04,65.58,66.17,66.79,67.43,68.07,68.67,69.24,69.72,70.13,70.48,70.74,70.96,71.14,71.32,71.5,71.1,71.1,71.4,71.6,72.0,72.2,72.8,73.1,73.2,73.3,73.5,73.7,74.5,74.7,75.1,75.3,75.9,76.2,76.2,76.4,77.0,77.3,77.5,77.8,78.0,78.2 +Comoros,40.58,40.91,41.25,41.61,41.99,42.38,42.78,43.19,43.61,44.04,44.47,44.89,45.32,45.75,46.18,46.63,47.1,47.58,48.09,48.61,49.12,49.63,50.12,50.59,51.03,51.46,51.89,52.3,52.72,53.15,53.59,54.03,54.48,54.93,55.36,55.77,56.15,56.5,56.81,57.1,57.4,57.8,58.2,58.5,58.9,58.4,59.4,60.0,61.4,62.1,63.0,63.8,64.8,65.5,66.0,66.3,66.6,67.1,66.7,67.7,67.2,67.6,67.8,68.0,68.1,68.2 +"Congo, Dem. Rep.",40.07,40.58,41.06,41.53,41.97,42.39,42.79,43.17,43.54,43.9,44.25,44.61,44.98,45.36,45.77,46.2,46.66,47.14,47.63,48.13,48.6,49.05,49.46,49.83,50.17,50.49,50.8,51.11,51.43,51.76,52.09,52.41,52.72,53.0,53.28,53.55,53.81,54.07,54.31,54.5,54.4,54.3,54.3,54.3,54.0,51.8,53.2,53.5,54.0,54.3,54.5,54.7,54.9,55.9,56.4,56.8,57.1,57.5,57.9,58.4,58.8,59.1,59.6,60.1,60.8,61.51 +"Congo, Rep.",41.81,42.56,43.32,44.05,44.78,45.5,46.21,46.92,47.6,48.25,48.88,49.47,50.04,50.55,51.02,51.45,51.84,52.21,52.54,52.85,53.14,53.42,53.69,53.94,54.2,54.45,54.71,54.97,55.22,55.45,55.65,55.81,55.93,55.98,55.94,55.79,55.54,55.21,54.78,54.3,54.4,54.4,53.5,53.2,52.6,52.2,46.3,49.9,51.6,52.5,53.5,54.3,55.0,55.8,56.7,57.8,58.3,58.8,59.8,60.4,60.9,61.3,61.5,61.5,61.5,61.5 +Costa Rica,56.6,57.19,57.79,58.38,58.98,59.57,60.17,60.77,61.37,61.97,62.56,63.13,63.7,64.26,64.8,65.33,65.85,66.35,66.84,67.34,67.86,68.4,68.95,69.53,70.12,70.75,71.38,72.0,72.62,73.2,73.73,74.22,74.66,75.04,75.37,75.66,75.9,76.14,76.37,76.6,76.5,76.6,76.6,76.7,76.8,76.8,77.0,77.2,77.5,77.7,78.0,78.2,78.4,78.7,79.0,79.3,79.6,79.8,79.8,79.8,79.9,80.0,80.1,80.2,80.3,80.4 +Cote d'Ivoire,32.0,32.54,33.1,33.71,34.36,35.03,35.75,36.49,37.24,38.0,38.74,39.46,40.17,40.84,41.51,42.21,42.93,43.7,44.53,45.38,46.27,47.15,48.02,48.85,49.63,50.37,51.06,51.7,52.31,52.87,53.38,53.87,54.31,54.69,55.02,55.26,55.43,55.5,55.46,55.3,54.9,54.4,53.7,53.2,52.5,52.3,52.3,52.2,52.2,52.0,52.1,52.3,52.6,52.8,53.4,54.1,54.9,55.4,56.0,56.6,57.0,57.5,58.1,58.5,59.1,59.71 +Croatia,60.57,61.08,61.6,62.1,62.58,63.06,63.52,63.98,64.41,64.85,65.26,65.66,66.05,66.43,66.8,67.16,67.52,67.87,68.22,68.54,68.86,69.14,69.4,69.63,69.83,70.0,70.16,70.3,70.42,70.56,70.71,70.89,71.08,71.31,71.54,71.78,72.0,72.22,72.4,72.6,71.9,72.3,72.9,73.4,73.0,73.4,73.4,73.5,73.8,74.2,74.6,74.9,75.1,75.3,75.7,75.9,76.0,76.2,76.4,76.7,77.1,77.4,77.6,77.8,77.8,77.8 +Cuba,58.53,59.12,59.71,60.29,60.89,61.48,62.07,62.66,63.25,63.85,64.47,65.09,65.71,66.35,66.99,67.6,68.2,68.78,69.32,69.84,70.34,70.82,71.29,71.74,72.18,72.59,72.96,73.3,73.59,73.84,74.05,74.22,74.36,74.48,74.57,74.62,74.65,74.67,74.67,74.7,74.8,74.7,74.7,74.8,75.0,75.2,75.4,75.6,75.8,76.2,76.4,76.8,76.9,77.0,77.1,77.3,77.5,77.6,77.7,77.8,77.9,78.0,78.0,78.1,78.2,78.3 +Cyprus,66.13,66.58,67.03,67.45,67.87,68.26,68.65,69.01,69.38,69.72,70.06,70.38,70.71,71.02,71.33,71.62,71.92,72.19,72.47,72.73,72.99,73.23,73.47,73.7,73.93,74.15,74.37,74.58,74.79,74.99,75.19,75.38,75.58,75.76,75.95,76.12,76.3,76.47,76.64,76.8,76.4,76.7,76.8,76.4,76.7,77.1,77.1,77.1,77.5,77.7,78.5,78.7,79.0,79.1,79.0,79.5,79.8,80.0,80.3,80.6,81.1,81.5,81.7,81.7,81.8,81.9 +Czech Republic,65.32,66.94,67.64,68.14,69.06,69.47,69.14,70.05,70.04,70.58,70.77,70.04,70.56,70.73,70.43,70.65,70.55,70.11,69.62,69.72,69.96,70.49,70.33,70.42,70.77,70.88,70.94,71.02,71.13,70.67,71.11,71.22,71.0,71.26,71.48,71.42,71.87,72.08,72.13,71.8,72.0,72.3,72.7,73.0,73.4,73.8,74.2,74.5,74.7,75.0,75.3,75.4,75.6,75.9,76.2,76.5,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.6,78.8,79.0 +Denmark,70.97,70.82,71.2,71.4,71.97,72.11,71.87,72.3,72.29,72.28,72.55,72.43,72.52,72.61,72.49,72.57,73.06,73.27,73.36,73.49,73.55,73.59,73.83,73.96,74.24,73.91,74.82,74.59,74.41,74.3,74.44,74.78,74.65,74.81,74.68,74.86,74.97,75.06,75.1,75.1,75.4,75.4,75.4,75.4,75.6,75.9,76.2,76.7,76.3,77.1,77.2,77.2,77.6,77.8,78.3,78.3,78.4,78.9,79.1,79.4,79.9,80.3,80.3,80.3,80.4,80.5 +Djibouti,41.48,41.89,42.31,42.77,43.23,43.71,44.21,44.73,45.24,45.77,46.28,46.79,47.3,47.8,48.33,48.9,49.53,50.23,50.99,51.75,52.51,53.2,53.83,54.38,54.85,55.29,55.71,56.15,56.61,57.1,57.59,58.08,58.55,58.97,59.38,59.74,60.09,60.42,60.72,61.0,60.7,60.4,60.7,60.0,60.4,60.3,60.1,60.0,59.9,60.0,60.1,60.2,60.3,60.4,60.7,60.7,61.5,61.8,62.1,62.3,62.5,62.8,63.1,63.1,63.8,64.51 +Dominican Republic,45.6,46.5,47.39,48.27,49.15,50.01,50.87,51.71,52.54,53.37,54.17,54.97,55.75,56.52,57.28,58.02,58.75,59.47,60.16,60.83,61.47,62.09,62.67,63.23,63.75,64.25,64.73,65.19,65.65,66.12,66.6,67.11,67.63,68.18,68.75,69.34,69.96,70.58,71.2,71.8,72.2,72.5,72.5,72.5,72.6,72.6,72.9,72.9,73.2,73.3,73.4,73.5,73.5,73.1,73.3,73.5,73.7,74.1,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5 +Ecuador,48.06,48.64,49.23,49.87,50.54,51.23,51.93,52.65,53.38,54.09,54.77,55.42,56.01,56.53,57.02,57.47,57.89,58.32,58.76,59.21,59.67,60.16,60.67,61.18,61.73,62.3,62.9,63.51,64.16,64.82,65.49,66.17,66.85,67.53,68.18,68.83,69.46,70.06,70.64,71.2,71.4,71.7,71.8,72.2,72.3,72.5,72.7,72.8,73.1,73.2,73.4,73.6,73.7,73.9,74.1,74.3,74.5,74.7,74.9,75.1,75.3,75.5,75.6,75.8,75.9,76.0 +Egypt,39.32,40.72,42.03,43.22,44.3,45.29,46.17,46.97,47.68,48.31,48.89,49.43,49.94,50.42,50.88,51.29,51.65,51.97,52.25,52.54,52.88,53.31,53.84,54.46,55.17,55.93,56.69,57.45,58.16,58.85,59.52,60.21,60.93,61.65,62.38,63.07,63.7,64.27,64.76,65.2,65.4,66.1,66.4,66.7,67.4,67.9,68.2,68.6,69.0,69.7,69.7,69.8,69.8,69.9,70.1,70.1,70.3,70.2,70.1,70.1,70.4,70.5,71.0,71.3,71.5,71.7 +El Salvador,44.11,45.06,45.99,46.9,47.8,48.68,49.55,50.39,51.22,52.02,52.77,53.5,54.18,54.81,55.4,55.93,56.41,56.84,57.24,57.57,57.85,58.07,58.22,58.33,58.36,58.33,58.22,58.09,57.98,57.96,58.13,58.53,59.19,60.11,61.24,62.54,63.91,65.28,66.55,67.7,68.1,68.9,69.3,69.6,70.0,70.3,70.8,71.0,71.6,71.9,71.7,72.5,72.6,72.8,73.0,73.3,73.5,73.7,73.8,74.1,74.3,74.5,74.6,74.8,74.9,75.0 +Equatorial Guinea,34.55,34.9,35.25,35.59,35.95,36.3,36.65,36.99,37.34,37.69,38.04,38.38,38.73,39.08,39.44,39.78,40.13,40.48,40.82,41.17,41.52,41.87,42.21,42.56,42.91,43.28,43.65,44.04,44.44,44.85,45.26,45.69,46.11,46.52,46.92,47.33,47.73,48.14,48.52,48.9,48.7,48.7,48.6,48.5,48.5,48.9,50.3,51.2,52.0,52.9,54.0,54.9,55.3,55.9,56.0,56.8,57.1,57.5,58.0,58.6,58.7,59.4,60.5,61.0,61.0,61.0 +Eritrea,36.47,36.75,37.02,37.29,37.58,37.86,38.14,38.42,38.73,39.03,39.35,39.69,40.04,40.41,40.81,41.22,41.66,42.1,42.56,43.02,43.47,43.92,44.35,44.75,45.14,45.49,45.8,46.09,46.38,46.66,46.97,47.33,47.74,48.21,48.77,49.38,50.06,50.8,51.58,52.4,53.4,54.9,56.2,57.0,57.8,58.4,59.0,58.8,52.2,37.6,59.9,60.0,59.9,60.0,59.9,60.0,60.1,60.1,60.1,60.1,60.2,60.3,60.4,60.6,60.7,60.8 +Estonia,59.91,61.13,63.7,65.05,65.73,67.36,67.84,68.29,68.72,69.42,69.74,69.93,69.99,70.74,70.81,70.78,71.08,70.7,70.4,70.51,70.71,70.48,70.83,70.94,70.26,69.88,70.01,69.87,69.66,69.75,69.62,70.03,69.95,69.83,69.97,71.11,71.13,71.17,70.73,70.1,69.6,69.3,68.2,66.3,67.7,69.8,70.0,69.5,70.2,70.4,70.0,70.9,71.5,72.0,72.5,72.9,73.0,74.2,74.9,76.4,76.3,76.7,77.5,77.6,77.8,78.0 +Ethiopia,33.09,33.41,33.8,34.23,34.72,35.25,32.41,30.37,37.08,37.72,38.35,38.94,39.49,39.36,38.13,39.09,41.09,41.38,41.65,41.9,42.14,41.98,39.85,37.71,38.78,42.86,42.41,42.07,42.74,42.8,42.87,42.93,42.5,39.46,35.43,41.39,43.95,44.4,44.82,45.2,46.9,47.8,48.4,48.8,49.2,50.0,50.6,51.1,50.6,52.1,52.7,53.6,54.3,55.2,56.1,57.2,58.6,60.0,61.2,62.1,62.9,63.6,64.2,64.7,65.2,65.7 +Fiji,51.3,51.85,52.38,52.9,53.4,53.89,54.36,54.81,55.26,55.7,56.12,56.54,56.94,57.35,57.75,58.14,58.52,58.89,59.26,59.61,59.96,60.29,60.6,60.91,61.21,61.5,61.8,62.09,62.37,62.65,62.92,63.2,63.46,63.71,63.96,64.2,64.43,64.66,64.88,65.1,65.1,65.0,64.8,64.7,64.5,64.3,64.1,64.2,64.1,64.2,64.4,64.5,64.6,64.7,64.8,64.8,64.9,64.9,64.9,65.2,65.3,65.4,65.6,65.7,65.8,65.9 +Finland,65.68,66.56,66.63,67.59,67.39,68.01,67.51,68.65,68.83,69.03,69.07,68.78,69.19,69.4,69.16,69.68,69.86,69.82,69.7,70.4,70.22,70.91,71.42,71.34,71.89,72.04,72.56,73.13,73.42,73.71,74.03,74.6,74.51,74.82,74.49,74.86,74.89,74.85,75.07,75.1,75.4,75.7,76.0,76.4,76.7,76.8,77.1,77.3,77.5,77.8,78.1,78.3,78.5,78.8,79.0,79.2,79.4,79.6,79.8,80.0,80.3,80.5,80.8,80.9,80.9,80.9 +France,66.17,67.46,67.4,68.27,68.54,68.57,69.0,70.24,70.27,70.49,71.07,70.61,70.46,71.43,71.26,71.67,71.67,71.66,71.4,72.29,72.27,72.52,72.69,73.04,73.13,73.38,73.99,74.12,74.43,74.53,74.69,75.07,75.06,75.56,75.67,75.95,76.55,76.78,76.91,77.2,77.3,77.6,77.7,78.0,78.2,78.4,78.7,78.7,78.8,79.1,79.2,79.4,79.6,80.2,80.4,80.7,81.0,81.1,81.2,81.4,81.6,81.6,81.7,81.7,81.8,81.9 +French Guiana,52.52,53.05,53.58,54.12,54.67,55.22,55.78,56.37,57.0,57.68,58.44,59.28,60.19,61.14,62.1,63.0,63.8,64.46,64.97,65.34,65.57,65.71,65.81,65.91,66.04,66.24,66.51,66.87,67.3,67.79,68.31,68.83,69.33,69.79,70.2,70.57,70.92,71.27,71.6,71.94,72.27,72.61,72.93,73.25,73.56,73.84,74.1,74.34,74.55,74.75,74.92,75.07,75.21,75.35,75.5,75.65,75.82,76.01,76.21,76.43,76.65,76.89,77.12,77.35,77.58,77.81 +French Polynesia,46.52,48.28,49.86,51.27,52.5,53.55,54.44,55.18,55.78,56.28,56.71,57.09,57.47,57.85,58.24,58.65,59.06,59.45,59.83,60.18,60.52,60.84,61.15,61.47,61.82,62.23,62.72,63.28,63.9,64.56,65.22,65.84,66.39,66.87,67.27,67.59,67.88,68.15,68.42,68.7,69.01,69.33,69.68,70.05,70.43,70.82,71.21,71.59,71.96,72.31,72.67,73.03,73.4,73.77,74.13,74.48,74.81,75.11,75.38,75.62,75.84,76.05,76.26,76.47,76.69,76.91 +Gabon,35.84,36.34,36.8,37.19,37.54,37.83,38.1,38.33,38.56,38.83,39.15,39.56,40.07,40.7,41.42,42.21,43.06,43.9,44.74,45.55,46.35,47.13,47.9,48.68,49.45,50.23,51.01,51.81,52.61,53.42,54.24,55.07,55.88,56.66,57.4,58.04,58.58,59.0,59.32,59.5,59.8,60.2,60.1,59.9,59.8,59.6,59.9,60.0,59.7,59.3,59.0,59.4,59.4,59.4,60.1,60.9,61.6,61.7,62.1,63.0,63.3,63.9,64.4,65.0,65.9,66.81 +Gambia,31.85,32.33,32.78,33.22,33.65,34.06,34.46,34.86,35.27,35.7,36.16,36.68,37.26,37.91,38.66,39.47,40.36,41.3,42.3,43.31,44.36,45.42,46.47,47.51,48.56,49.58,50.6,51.61,52.62,53.61,54.59,55.55,56.48,57.37,58.21,58.97,59.65,60.26,60.81,61.3,61.5,61.5,62.0,62.3,62.6,62.8,63.1,63.4,63.4,63.6,63.9,63.8,64.4,64.7,64.9,65.2,65.3,65.7,66.0,66.5,67.1,67.5,67.8,68.0,68.1,68.2 +Georgia,59.96,60.36,60.75,61.15,61.54,61.93,62.32,62.72,63.11,63.5,63.9,64.31,64.71,65.11,65.52,65.9,66.26,66.6,66.93,67.24,67.54,67.85,68.17,68.47,68.76,69.0,69.19,69.31,69.37,69.4,69.42,69.46,69.52,69.62,69.75,69.86,69.94,69.96,69.95,69.9,69.9,69.4,69.2,70.2,70.7,71.2,71.3,71.4,71.4,71.4,71.7,71.6,71.7,71.5,71.8,71.9,72.1,71.8,72.1,72.2,72.2,72.4,72.5,72.6,72.9,73.2 +Germany,67.08,67.4,67.7,68.0,68.28,68.57,68.49,69.23,69.34,69.26,69.85,70.01,70.1,70.66,70.65,70.77,70.99,70.64,70.48,70.72,70.94,71.16,71.41,71.71,71.56,72.02,72.63,72.6,72.96,73.14,73.37,73.69,73.97,74.44,74.55,74.75,75.15,75.33,75.51,75.4,75.6,76.0,76.1,76.4,76.6,76.9,77.3,77.6,77.8,78.1,78.4,78.6,78.8,79.2,79.4,79.7,79.9,80.0,80.1,80.3,80.5,80.6,80.7,80.7,80.8,80.9 +Ghana,41.66,42.22,42.76,43.3,43.83,44.36,44.87,45.37,45.86,46.34,46.8,47.25,47.66,48.07,48.44,48.8,49.14,49.46,49.78,50.08,50.39,50.7,51.02,51.35,51.68,52.0,52.33,52.63,52.95,53.26,53.6,53.95,54.34,54.76,55.23,55.75,56.31,56.89,57.47,58.0,58.4,58.7,59.5,59.6,60.0,60.1,59.8,60.1,60.1,60.0,59.9,60.0,60.2,60.5,60.8,61.2,61.6,62.0,62.4,62.9,63.5,64.1,64.5,64.8,65.3,65.8 +Greece,65.57,65.72,65.92,66.16,66.46,66.79,67.16,67.57,67.99,68.41,68.8,69.14,69.44,69.69,69.91,70.12,70.34,70.59,70.88,71.2,71.53,71.85,72.13,72.39,72.62,72.85,73.1,73.38,73.68,74.01,74.33,74.64,74.94,75.21,75.47,75.73,76.01,76.32,76.66,77.0,77.1,77.1,77.5,77.7,77.8,77.9,78.1,78.2,78.3,78.6,78.9,79.1,79.3,79.4,79.6,80.0,79.8,80.2,80.2,80.4,80.5,80.6,81.0,81.0,81.0,81.0 +Greenland,43.94,45.59,48.67,51.76,54.85,57.94,58.82,59.71,60.6,61.49,61.85,62.22,62.59,62.97,63.34,63.71,64.08,64.45,64.82,65.19,65.01,64.84,64.66,64.49,64.31,64.14,63.96,63.78,63.61,63.09,62.71,62.8,62.89,63.05,63.42,63.81,64.22,64.14,64.22,64.6,65.1,65.5,65.9,66.3,66.5,66.8,66.9,67.2,67.5,67.8,68.0,68.3,68.5,68.8,69.1,69.5,70.0,70.3,70.6,70.8,71.2,71.6,71.8,72.0,72.1,72.2 +Grenada,55.81,56.39,56.97,57.52,58.07,58.61,59.12,59.63,60.11,60.59,61.05,61.49,61.93,62.35,62.76,63.16,63.54,63.91,64.27,64.62,64.97,65.29,65.62,65.92,66.22,66.52,66.79,67.07,67.33,67.6,67.86,68.1,68.35,68.59,68.83,69.06,69.28,69.5,69.7,69.9,70.2,70.2,70.0,70.4,70.7,70.8,70.8,70.6,70.6,70.5,70.3,70.2,70.2,69.3,70.3,70.5,70.7,70.8,70.9,71.0,71.0,71.1,71.2,71.4,71.5,71.6 +Guadeloupe,52.09,52.94,53.77,54.57,55.35,56.11,56.84,57.55,58.24,58.91,59.58,60.23,60.87,61.51,62.14,62.75,63.34,63.91,64.46,64.98,65.49,65.99,66.49,66.97,67.46,67.93,68.4,68.86,69.31,69.75,70.18,70.6,71.01,71.42,71.82,72.21,72.6,72.98,73.35,73.72,74.08,74.44,74.79,75.14,75.48,75.82,76.15,76.48,76.8,77.12,77.43,77.74,78.04,78.35,78.65,78.95,79.25,79.55,79.85,80.14,80.43,80.69,80.95,81.18,81.41,81.64 +Guam,56.53,57.04,57.55,58.08,58.6,59.12,59.65,60.18,60.71,61.24,61.76,62.28,62.79,63.29,63.78,64.26,64.72,65.18,65.63,66.06,66.49,66.9,67.3,67.7,68.07,68.43,68.79,69.13,69.47,69.8,70.12,70.42,70.73,71.02,71.3,71.58,71.84,72.09,72.35,72.6,72.4,72.4,72.5,72.7,73.0,73.2,69.4,73.4,73.5,73.6,73.6,73.6,73.5,73.3,73.1,72.7,72.4,72.1,71.8,71.6,71.5,71.5,71.6,71.6,71.7,71.8 +Guatemala,42.06,42.44,42.83,43.27,43.73,44.23,44.77,45.32,45.91,46.51,47.12,47.76,48.4,49.05,49.73,50.43,51.16,51.93,52.72,53.5,54.27,55.0,55.67,56.28,56.82,57.32,57.78,58.22,58.66,59.12,59.6,60.1,60.62,61.16,61.72,62.28,62.86,63.45,64.02,64.6,64.0,63.8,64.2,64.6,66.9,68.1,67.7,67.7,68.8,68.8,69.3,70.0,70.1,70.2,69.8,70.2,71.0,71.2,70.9,71.2,71.6,72.1,72.3,72.4,72.6,72.8 +Guinea,33.12,33.44,33.74,34.04,34.35,34.64,34.92,35.2,35.46,35.71,35.95,36.17,36.37,36.57,36.77,36.96,37.16,37.37,37.61,37.89,38.2,38.57,38.97,39.43,39.94,40.47,41.04,41.63,42.25,42.92,43.66,44.47,45.37,46.32,47.33,48.35,49.37,50.33,51.22,52.0,52.3,52.5,53.0,53.1,53.4,53.8,54.0,54.0,54.0,54.2,54.4,54.7,55.1,55.6,56.0,56.4,56.8,57.1,57.5,57.9,58.2,58.5,58.8,58.6,59.1,59.6 +Guinea-Bissau,39.65,40.03,40.42,40.81,41.2,41.58,41.97,42.36,42.75,43.14,43.39,43.64,43.89,44.15,44.39,44.63,44.86,45.09,45.29,45.5,45.71,45.91,46.12,46.33,46.54,46.77,47.02,47.27,47.54,47.83,48.13,48.45,48.78,49.13,49.49,49.87,50.26,50.67,51.09,51.5,51.7,51.8,52.0,52.2,52.3,52.6,52.8,51.7,52.5,52.8,52.7,52.7,52.8,52.8,52.9,53.0,53.2,53.6,53.9,54.3,54.5,54.8,55.1,55.3,55.6,55.9 +Guyana,57.51,57.68,57.85,58.04,58.21,58.38,58.56,58.73,58.9,59.08,59.24,59.41,59.58,59.75,59.92,60.09,60.25,60.43,60.59,60.75,60.92,61.08,61.24,61.4,61.56,61.72,61.88,62.03,62.18,62.34,62.5,62.67,62.85,63.02,63.21,63.41,63.61,63.8,64.01,64.2,64.3,64.5,64.4,64.5,64.4,64.3,64.3,64.3,64.3,64.2,63.9,63.5,63.7,64.2,64.4,64.8,64.9,65.0,65.3,65.5,65.6,65.9,66.2,66.4,66.8,67.2 +Haiti,36.56,37.22,37.87,38.5,39.12,39.74,40.34,40.93,41.52,42.1,42.68,43.26,43.82,44.38,44.93,45.43,45.9,46.33,46.73,47.1,47.45,47.81,48.17,48.53,48.9,49.28,49.63,49.97,50.3,50.62,50.94,51.29,51.64,52.02,52.42,52.81,53.21,53.59,53.95,54.3,54.4,54.9,54.7,55.4,56.2,56.7,57.0,57.5,58.0,58.7,59.2,59.6,59.7,58.6,60.0,60.3,60.8,61.0,61.7,32.2,62.4,62.9,63.4,63.8,64.3,64.8 +Honduras,41.86,42.39,42.95,43.54,44.16,44.83,45.52,46.23,46.97,47.71,48.47,49.21,49.94,50.65,51.35,52.02,52.68,53.34,54.0,54.68,55.37,56.07,56.8,57.56,58.34,59.15,59.97,60.8,61.65,62.5,63.36,64.24,65.12,66.01,66.86,67.67,68.42,69.11,69.73,70.3,70.3,70.1,69.9,70.1,70.1,70.1,70.2,63.9,70.3,70.5,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,71.9,72.0,72.2,72.3,72.6,72.8,73.0,73.2 +"Hong Kong, China",62.38,62.9,63.43,63.98,64.54,65.11,65.69,66.28,66.87,67.45,68.01,68.55,69.05,69.52,69.96,70.37,70.77,71.15,71.53,71.89,72.25,72.58,72.9,73.2,73.49,73.78,74.06,74.35,74.64,74.93,75.22,75.5,75.77,76.03,76.28,76.53,76.78,77.02,77.27,77.52,77.77,78.01,78.25,78.48,78.72,78.99,79.29,79.63,79.99,80.36,80.73,81.08,81.4,81.68,81.92,82.12,82.31,82.49,82.66,82.84,83.02,83.2,83.38,83.56,83.73,83.9 +Hungary,62.48,64.05,63.89,65.46,66.91,66.07,66.44,67.45,67.35,68.13,69.06,68.0,69.02,69.52,69.22,69.98,69.55,69.38,69.45,69.29,69.19,69.82,69.69,69.41,69.46,69.75,70.02,69.56,69.77,69.18,69.24,69.47,69.03,69.07,69.01,69.22,69.69,70.09,69.53,69.5,69.2,69.1,69.2,69.5,70.1,70.5,70.9,71.1,71.3,71.8,72.3,72.6,72.7,72.9,73.1,73.3,73.6,73.9,74.3,74.6,75.0,75.5,76.1,76.5,76.7,76.9 +Iceland,71.12,72.57,72.39,73.45,73.4,73.08,73.58,73.55,72.78,74.22,73.6,73.82,73.13,73.72,74.0,73.4,73.9,74.12,73.9,74.0,73.75,74.66,74.52,74.59,75.57,76.94,76.35,76.66,76.88,76.92,76.61,77.26,76.91,77.71,77.85,78.38,77.53,77.39,78.46,78.3,78.4,78.6,78.8,79.1,78.9,79.4,79.6,79.9,80.2,80.5,80.8,81.0,81.3,81.5,81.7,81.8,82.1,82.4,82.5,82.8,82.9,83.1,83.2,83.3,83.3,83.3 +India,35.1,35.76,36.44,37.11,37.79,38.48,39.16,39.85,40.56,41.26,41.99,42.72,43.46,44.23,44.98,45.73,46.49,47.21,47.93,48.65,49.35,50.08,50.81,51.53,52.25,52.93,53.56,54.14,54.65,55.1,55.51,55.86,56.19,56.51,56.81,57.11,57.39,57.65,57.93,58.2,58.5,58.8,59.1,59.5,59.9,60.2,60.5,60.8,61.2,61.5,61.9,62.3,62.8,63.2,63.6,63.9,64.3,64.7,65.0,65.4,65.7,66.1,66.5,66.9,67.2,67.5 +Indonesia,36.99,37.93,38.86,39.78,40.68,41.57,42.45,43.32,44.17,45.01,45.83,46.65,47.45,48.24,43.77,44.18,50.54,51.27,52.0,52.71,53.4,54.09,54.75,55.41,56.04,56.67,57.27,57.87,58.45,59.01,59.57,60.12,60.64,61.16,61.66,62.15,62.63,63.1,63.55,64.0,64.5,64.9,65.3,65.7,66.1,66.4,66.7,67.0,67.2,67.5,67.8,68.0,68.2,66.7,68.7,68.9,69.2,69.4,69.6,69.8,70.1,70.3,70.6,70.8,71.1,71.4 +Iran,40.29,40.92,41.56,42.19,42.84,43.47,44.11,44.74,45.38,46.0,46.61,47.22,47.83,48.43,49.04,49.66,50.3,50.98,51.67,52.43,53.28,54.24,55.24,56.24,57.1,57.64,57.78,57.52,56.95,56.24,55.62,55.32,55.49,56.19,57.39,59.01,60.83,62.67,64.43,66.0,67.8,68.5,69.1,69.6,69.9,69.8,70.3,70.8,71.3,71.4,71.3,71.3,70.1,71.5,71.9,72.4,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.6,74.6 +Iraq,35.08,36.58,38.04,39.45,40.81,42.11,43.38,44.61,45.79,46.96,48.11,49.24,50.36,51.46,52.52,53.51,54.42,55.24,55.96,56.61,57.21,57.77,58.31,58.81,59.19,59.35,59.26,58.94,58.44,57.91,57.52,57.41,57.68,58.34,59.36,60.65,62.05,63.41,64.65,65.7,63.9,65.4,65.4,65.4,65.3,65.3,65.2,65.7,65.9,65.8,66.4,66.1,66.1,66.3,65.7,65.1,65.3,66.6,67.1,67.3,67.7,68.1,68.3,67.7,67.4,67.1 +Ireland,65.07,67.52,68.3,68.44,68.46,69.43,69.51,69.84,69.99,70.76,70.24,70.57,70.85,71.12,71.35,70.89,71.95,71.67,71.62,71.68,72.5,71.86,72.11,72.08,72.68,72.81,72.98,72.98,73.28,73.66,74.04,74.34,74.4,74.87,74.84,74.93,75.76,75.82,75.87,76.3,76.7,76.8,76.9,77.3,77.1,77.5,77.4,77.6,77.7,77.8,78.4,78.8,79.1,79.3,79.7,79.8,80.1,80.1,80.3,81.0,80.6,81.1,81.5,81.6,81.7,81.8 +Israel,64.42,65.04,65.62,66.15,66.65,67.1,67.51,67.89,68.24,68.55,68.85,69.13,69.41,69.68,69.93,70.17,70.39,70.6,70.78,70.96,71.13,71.33,71.54,71.78,72.04,72.33,72.62,72.9,73.19,73.47,73.74,73.99,74.49,74.78,75.1,74.92,75.29,75.65,76.24,76.7,76.5,76.3,76.9,77.1,77.4,77.7,77.9,78.1,78.5,78.6,78.8,78.6,79.1,79.5,79.7,79.6,80.3,80.6,81.0,81.6,81.6,82.1,82.0,81.3,82.1,82.91 +Italy,65.3,65.93,66.56,67.88,68.23,67.62,67.79,68.85,69.3,69.19,69.82,69.21,69.32,70.37,70.24,70.99,71.03,70.85,70.87,71.62,71.87,72.15,72.09,72.81,72.72,73.07,73.44,73.78,74.11,74.07,74.46,74.93,74.75,75.51,75.62,75.94,76.36,76.54,76.94,77.0,77.0,77.3,77.6,77.8,78.1,78.3,78.7,78.9,79.3,79.6,79.8,80.1,80.1,80.9,81.1,81.2,81.3,81.5,81.6,81.9,82.0,82.0,82.1,82.1,82.2,82.3 +Jamaica,58.02,59.06,60.07,61.03,61.95,62.83,63.66,64.47,65.21,65.91,66.57,67.17,67.74,68.25,68.73,69.17,69.58,69.99,70.36,70.72,71.06,71.39,71.71,72.0,72.29,72.58,72.89,73.21,73.52,73.82,74.1,74.34,74.55,74.7,74.79,74.84,74.85,74.85,74.83,74.8,74.9,74.9,74.8,74.8,74.7,74.5,74.4,74.5,74.6,74.4,74.2,74.5,74.8,75.0,75.4,75.5,75.3,75.1,74.8,74.8,74.6,74.7,74.8,74.8,75.0,75.2 +Japan,60.98,63.02,63.36,64.6,65.76,65.62,65.49,67.11,67.49,67.78,68.43,68.71,69.79,70.26,70.31,71.12,71.41,71.73,71.96,72.05,72.87,73.39,73.45,73.88,74.38,74.78,75.35,75.67,76.18,76.16,76.57,77.08,77.11,77.5,77.8,78.22,78.63,78.54,78.97,79.0,79.1,79.3,79.4,79.8,79.7,80.2,80.4,80.5,80.6,81.0,81.3,81.6,81.7,81.9,82.0,82.2,82.4,82.5,82.7,82.7,82.6,82.9,83.0,83.1,83.2,83.3 +Jordan,45.56,46.45,47.34,48.23,49.09,49.95,50.8,51.65,52.48,53.3,54.12,54.94,55.75,56.55,57.35,58.13,58.9,59.66,60.42,61.15,61.87,62.59,63.3,63.98,64.64,65.28,65.89,66.46,67.0,67.51,68.0,68.45,68.9,69.33,69.75,70.15,70.52,70.87,71.2,71.5,71.9,72.2,72.2,72.4,72.5,72.6,72.8,73.0,73.2,73.4,73.6,73.8,74.0,74.1,74.5,75.5,76.3,76.9,77.5,77.9,78.1,78.2,78.3,78.4,78.5,78.6 +Kazakhstan,54.67,55.15,55.63,56.11,56.58,57.05,57.51,57.98,58.44,58.91,59.38,59.85,60.31,60.79,61.24,61.69,62.12,62.53,62.92,63.27,63.6,63.89,64.16,64.4,64.64,64.88,65.13,65.41,65.71,66.05,66.43,66.84,67.28,67.7,68.07,68.34,68.48,68.47,68.31,68.0,67.6,67.1,65.3,64.6,63.6,63.5,63.9,64.2,65.0,64.9,65.1,65.4,65.3,65.3,65.3,65.3,65.8,67.1,68.2,68.5,69.1,69.7,70.0,70.2,70.2,70.2 +Kenya,42.33,42.71,43.16,43.64,44.17,44.75,45.37,46.03,46.72,47.42,48.13,48.82,49.48,50.13,50.75,51.35,51.96,52.57,53.19,53.83,54.45,55.08,55.69,56.29,56.89,57.49,58.1,58.74,59.36,59.96,60.52,61.02,61.42,61.74,61.96,62.09,62.13,62.1,62.0,61.8,61.1,60.3,59.5,58.7,58.1,57.4,56.7,56.1,55.8,55.6,55.6,55.7,55.8,56.2,57.2,58.4,59.8,60.8,61.9,62.9,63.7,64.3,64.8,65.0,65.1,65.2 +Kiribati,42.25,42.65,43.05,43.44,43.85,44.25,44.64,45.04,45.45,45.84,46.24,46.64,47.03,47.44,47.84,48.23,48.63,49.02,49.42,49.82,50.21,50.61,51.02,51.41,51.81,52.2,52.58,52.97,53.36,53.75,54.17,54.62,55.08,55.56,56.04,56.5,56.93,57.33,57.68,58.0,58.2,58.4,58.4,58.7,58.9,59.2,59.4,59.5,59.6,59.8,60.1,60.2,60.4,60.6,60.8,61.0,61.2,61.5,61.7,61.9,62.1,62.3,62.6,62.8,63.0,63.2 +Kuwait,52.95,54.13,55.27,56.36,57.43,58.45,59.44,60.38,61.29,62.15,62.98,63.77,64.53,65.24,65.92,66.58,67.2,67.8,68.38,68.93,69.46,69.97,70.46,70.93,71.39,71.85,72.29,72.72,73.14,73.58,74.0,74.41,74.81,75.22,75.59,75.95,76.29,76.62,76.92,77.2,64.4,80.0,78.7,77.6,76.5,76.0,76.2,76.3,77.3,77.7,77.6,78.2,78.5,78.1,77.7,77.7,77.7,77.3,77.4,78.5,79.0,79.1,79.7,80.2,80.3,80.4 +Kyrgyz Republic,52.07,52.52,52.96,53.41,53.86,54.31,54.75,55.2,55.64,56.09,56.54,56.99,57.44,57.9,58.34,58.76,59.19,59.58,59.95,60.3,60.61,60.88,61.14,61.38,61.6,61.83,62.05,62.3,62.57,62.89,63.23,63.62,64.04,64.45,64.86,65.23,65.54,65.77,65.93,66.0,65.9,65.6,65.3,65.0,65.1,65.2,65.3,65.6,65.8,65.9,66.0,65.9,66.0,66.2,66.5,66.7,67.0,67.3,67.7,67.9,68.5,69.0,69.4,69.6,69.8,70.0 +Lao,39.88,40.13,40.37,40.62,40.86,41.11,41.37,41.62,41.87,42.13,42.38,42.64,42.89,43.13,43.39,43.64,43.89,44.15,44.41,44.66,44.91,45.16,45.39,45.62,45.85,46.05,46.26,46.47,46.69,46.91,47.17,47.45,47.76,48.12,48.54,49.02,49.56,50.16,50.8,51.5,52.0,52.4,52.8,53.2,53.6,54.0,54.4,54.9,55.5,56.1,56.6,57.6,58.4,59.3,60.1,60.8,61.7,62.5,63.3,64.1,65.0,65.6,66.1,66.6,67.1,67.6 +Latvia,60.48,61.88,63.19,64.38,65.46,66.44,67.31,68.07,69.53,70.37,70.6,69.97,70.36,71.62,71.29,71.26,70.94,70.56,70.29,70.31,70.66,70.35,70.29,70.22,69.37,69.48,69.56,69.45,68.93,69.23,69.18,69.75,69.51,69.56,69.72,71.09,71.14,71.05,70.55,69.6,69.1,68.4,66.7,65.7,66.5,68.6,69.3,69.0,70.0,70.5,70.0,70.4,70.8,71.2,71.1,70.8,71.3,72.4,73.3,73.9,74.6,75.1,75.0,75.2,75.4,75.6 +Lebanon,59.61,60.04,60.45,60.85,61.23,61.6,61.95,62.28,62.6,62.9,63.19,63.47,63.74,64.0,64.25,64.5,64.76,65.01,65.27,65.52,65.75,65.98,66.18,66.37,66.54,66.69,66.83,66.96,67.08,67.21,67.36,67.52,67.68,67.87,68.07,68.29,68.53,68.78,69.03,69.3,71.9,72.2,72.5,73.0,73.4,74.0,74.4,74.9,75.6,75.9,76.3,76.6,76.9,77.1,77.3,77.4,77.5,77.8,77.9,78.1,76.6,78.5,78.6,78.7,78.9,79.1 +Lesotho,41.53,42.11,42.72,43.33,43.96,44.59,45.22,45.85,46.46,47.02,47.54,47.97,48.32,48.59,48.79,48.95,49.09,49.24,49.43,49.67,49.96,50.31,50.7,51.14,51.63,52.17,52.75,53.38,54.01,54.65,55.25,55.82,56.34,56.83,57.31,57.88,58.51,59.21,59.92,60.5,60.6,60.4,60.1,59.2,58.7,57.9,56.6,54.6,52.9,50.7,48.9,47.0,45.4,44.2,43.1,43.1,43.3,44.5,45.5,46.4,46.7,46.1,45.6,45.4,47.1,48.86 +Liberia,33.11,33.36,33.6,33.84,34.07,34.28,34.51,34.73,34.98,35.24,35.54,35.88,36.28,36.73,37.23,37.77,38.33,38.91,39.49,40.1,40.75,41.43,42.16,42.91,43.68,44.45,45.21,45.92,46.57,47.14,47.6,47.97,48.25,48.46,48.58,48.62,48.59,48.55,48.53,48.6,51.5,51.8,50.1,48.9,50.9,50.4,53.8,54.4,55.2,55.8,56.3,55.4,55.2,57.9,58.4,58.8,59.3,59.9,60.3,60.8,61.5,62.3,62.9,61.8,63.2,64.63 +Libya,38.07,37.73,37.66,37.89,38.39,39.18,40.22,41.5,42.97,44.59,46.28,48.0,49.69,51.28,52.77,54.15,55.45,56.69,57.88,59.01,60.11,61.16,62.16,63.13,64.06,64.95,65.81,66.62,67.4,68.13,68.82,69.46,70.04,70.58,71.09,71.56,72.03,72.48,72.94,73.4,73.7,73.8,74.2,74.4,74.6,74.6,74.8,74.8,74.9,74.8,75.0,75.0,75.1,75.2,75.4,75.5,75.5,75.6,75.7,75.9,60.5,75.5,75.8,75.0,74.1,73.21 +Lithuania,63.9,64.52,65.14,65.77,66.38,66.99,67.59,68.19,67.73,70.33,70.52,69.46,70.64,72.0,71.76,71.92,71.99,71.68,71.3,71.16,72.1,71.34,71.7,71.63,71.24,71.38,71.14,70.93,70.8,70.78,70.77,71.17,71.09,70.6,70.78,72.45,72.26,72.1,71.79,71.5,70.5,70.3,69.1,68.7,69.0,70.2,71.1,71.3,71.8,72.1,71.6,72.1,72.1,72.2,71.7,71.5,71.4,72.1,73.6,73.9,74.3,74.7,74.9,75.0,75.2,75.4 +Luxembourg,65.38,65.71,66.04,66.37,66.67,66.98,67.27,67.55,67.83,68.99,69.49,68.59,68.8,68.98,69.31,69.21,69.59,70.17,69.73,69.47,69.35,70.59,70.34,70.42,70.37,70.31,71.61,71.57,72.25,72.42,72.22,72.31,73.19,72.94,73.51,74.44,73.97,74.57,74.49,75.2,75.5,75.8,76.2,76.5,76.9,77.1,77.4,77.7,78.1,78.5,78.7,79.0,79.1,79.5,80.0,80.3,80.6,81.0,81.2,81.3,81.5,81.7,81.9,82.1,82.2,82.3 +"Macao, China",60.25,60.79,61.32,61.84,62.37,62.89,63.41,63.92,64.43,64.93,65.42,65.9,66.36,66.81,67.24,67.66,68.06,68.45,68.83,69.2,69.56,69.91,70.26,70.61,70.95,71.29,71.62,71.94,72.26,72.57,72.88,73.17,73.46,73.75,74.03,74.31,74.58,74.84,75.1,75.36,75.61,75.86,76.1,76.33,76.56,76.78,77.0,77.21,77.42,77.63,77.83,78.04,78.25,78.46,78.67,78.89,79.1,79.32,79.54,79.75,79.97,80.19,80.4,80.61,80.82,81.03 +"Macedonia, FYR",53.65,54.61,55.53,56.4,57.25,58.04,58.79,59.51,60.2,60.85,61.49,62.11,62.72,63.32,63.92,64.51,65.08,65.62,66.14,66.63,67.08,67.48,67.83,68.14,68.41,68.61,68.76,68.88,68.98,69.08,69.21,69.4,69.63,69.92,70.26,70.6,70.93,71.23,71.48,71.7,71.7,71.6,71.5,71.7,71.8,72.1,72.3,72.4,72.6,72.9,73.0,73.3,73.4,73.6,73.8,74.1,74.3,74.5,74.7,75.2,75.6,75.8,76.0,76.2,76.5,76.8 +Madagascar,36.69,37.28,37.86,38.45,39.03,39.62,40.21,40.79,41.38,41.96,42.54,43.12,43.7,44.28,44.85,45.43,46.01,46.6,47.18,47.77,48.36,48.94,49.5,50.06,50.59,51.12,51.63,52.12,52.58,53.01,53.36,53.64,53.86,54.03,54.19,54.38,54.63,54.98,55.43,56.0,56.2,56.4,56.3,56.8,57.2,57.6,58.0,58.3,58.8,59.1,59.6,59.8,60.1,60.6,61.2,61.7,62.0,62.2,62.3,62.4,62.6,62.8,63.0,63.3,63.5,63.7 +Malawi,36.45,36.62,36.81,37.02,37.24,37.48,37.72,37.99,38.25,38.51,38.76,39.02,39.25,39.49,39.75,40.03,40.36,40.73,41.16,41.62,42.09,42.55,43.0,43.41,43.79,44.16,44.54,44.92,45.31,45.72,46.13,46.53,46.91,47.26,47.6,47.9,48.17,48.42,48.64,48.8,48.6,48.3,48.0,47.4,46.9,46.3,45.8,45.3,45.1,45.4,45.9,46.4,47.0,47.5,48.5,49.6,51.0,52.4,53.9,55.4,56.6,58.0,59.3,60.1,60.5,60.9 +Malaysia,54.05,54.72,55.39,56.06,56.72,57.37,58.01,58.65,59.27,59.89,60.48,61.07,61.63,62.17,62.71,63.21,63.7,64.17,64.63,65.08,65.51,65.93,66.34,66.73,67.13,67.5,67.86,68.21,68.56,68.89,69.22,69.53,69.84,70.14,70.45,70.73,71.01,71.28,71.54,71.8,72.0,72.2,72.4,72.4,72.4,72.5,72.8,73.0,73.1,73.3,73.6,73.8,73.9,74.0,74.3,74.5,74.5,74.5,74.3,74.4,74.6,74.7,74.9,75.1,75.3,75.5 +Maldives,33.9,34.18,34.49,34.86,35.27,35.72,36.22,36.78,37.39,38.07,38.82,39.64,40.54,41.48,42.47,43.48,44.49,45.48,46.44,47.37,48.25,49.12,49.98,50.82,51.69,52.59,53.53,54.51,55.53,56.58,57.62,58.64,59.64,60.6,61.51,62.41,63.3,64.19,65.08,66.0,66.7,67.3,67.9,68.6,69.3,70.0,70.8,71.7,72.3,73.0,73.7,74.4,75.3,74.7,76.9,77.5,78.1,78.5,78.9,79.2,79.6,79.8,79.9,80.0,80.0,80.0 +Mali,27.34,27.71,28.04,28.34,28.6,28.84,29.04,29.23,29.42,29.61,29.83,30.08,30.4,30.79,31.26,31.8,32.41,33.07,33.77,34.51,35.27,36.04,36.82,37.61,38.39,39.18,39.97,40.79,41.61,42.45,43.3,44.17,45.02,45.86,46.68,47.45,48.18,48.86,49.47,50.0,50.5,50.8,51.2,51.2,51.4,51.8,52.2,50.9,53.5,53.5,54.1,54.6,55.5,56.2,56.9,57.4,58.0,58.5,58.9,59.2,59.6,59.8,59.8,60.0,60.2,60.4 +Malta,66.02,66.17,66.35,66.55,66.79,67.06,67.34,67.65,67.97,68.32,68.67,69.02,69.37,69.7,70.03,70.36,70.67,70.98,71.29,71.6,71.9,72.2,72.49,72.78,73.07,73.36,73.63,73.92,74.19,74.47,74.74,75.01,75.28,75.54,75.81,76.08,76.33,76.59,76.84,77.1,77.3,77.5,77.9,78.2,78.4,78.5,78.8,78.9,79.0,79.2,79.4,79.8,80.1,80.3,80.7,81.0,80.9,80.7,81.2,81.3,81.3,81.6,81.7,82.0,82.1,82.2 +Martinique,54.51,55.23,55.93,56.61,57.28,57.93,58.57,59.2,59.81,60.41,61.0,61.58,62.16,62.72,63.28,63.84,64.39,64.93,65.46,65.99,66.51,67.02,67.53,68.02,68.51,69.0,69.47,69.93,70.38,70.82,71.25,71.68,72.09,72.5,72.9,73.29,73.67,74.05,74.42,74.79,75.15,75.51,75.86,76.2,76.54,76.88,77.22,77.55,77.88,78.19,78.5,78.78,79.05,79.31,79.55,79.78,80.01,80.24,80.48,80.71,80.95,81.18,81.41,81.64,81.86,82.08 +Mauritania,37.95,38.53,39.14,39.77,40.42,41.09,41.78,42.48,43.2,43.91,44.62,45.31,45.96,46.59,47.18,47.73,48.26,48.78,49.27,49.77,50.25,50.73,51.2,51.69,52.19,52.73,53.29,53.89,54.51,55.13,55.75,56.34,56.9,57.41,57.86,58.28,58.64,58.96,59.25,59.5,60.2,60.4,60.7,60.7,61.2,61.5,62.0,62.5,63.2,63.8,64.2,64.9,65.5,65.9,66.3,67.0,67.5,67.9,68.2,68.6,68.8,69.1,69.3,69.6,69.7,69.8 +Mauritius,48.57,49.61,50.68,51.78,52.92,54.09,55.28,56.46,57.63,58.74,59.75,60.64,61.38,61.97,62.4,62.67,62.85,62.97,63.05,63.14,63.27,63.45,63.68,63.99,64.37,64.83,65.34,65.87,66.41,66.92,67.34,67.7,67.96,68.14,68.26,68.38,68.53,68.74,68.99,69.3,69.6,69.7,69.8,70.0,70.3,70.5,70.7,71.0,71.2,71.4,71.6,71.7,71.9,72.1,72.4,72.5,72.7,72.9,73.2,73.4,73.7,74.1,74.2,74.3,74.5,74.7 +Mayotte,45.38,46.68,47.92,49.11,50.24,51.32,52.34,53.3,54.22,55.09,55.92,56.72,57.5,58.25,58.98,59.7,60.39,61.07,61.73,62.36,62.99,63.59,64.17,64.74,65.3,65.84,66.36,66.88,67.38,67.86,68.34,68.8,69.25,69.69,70.12,70.54,70.95,71.35,71.75,72.14,72.53,72.91,73.28,73.64,74.0,74.35,74.7,75.03,75.36,75.69,76.01,76.33,76.64,76.95,77.24,77.53,77.8,78.05,78.29,78.52,78.74,78.96,79.19,79.42,79.65,79.88 +Mexico,49.27,50.37,51.42,52.43,53.39,54.29,55.14,55.94,56.67,57.34,57.95,58.49,58.96,59.4,59.78,60.15,60.53,60.91,61.32,61.77,62.25,62.75,63.29,63.83,64.39,64.95,65.51,66.05,66.58,67.09,67.58,68.05,68.52,68.97,69.4,69.84,70.26,70.67,71.09,71.5,71.9,72.1,72.4,72.7,73.0,73.3,73.6,73.7,74.1,74.6,74.9,74.9,74.9,75.2,75.1,75.4,75.6,75.4,75.3,75.4,75.7,75.7,75.4,75.6,75.9,76.2 +"Micronesia, Fed. Sts.",53.56,53.92,54.28,54.65,55.01,55.37,55.73,56.09,56.45,56.82,57.18,57.54,57.9,58.26,58.63,58.99,59.36,59.73,60.1,60.48,60.89,61.3,61.71,62.12,62.5,62.85,63.14,63.37,63.53,63.64,63.71,63.77,63.81,63.86,63.92,63.99,64.07,64.15,64.23,64.3,64.5,64.7,64.9,65.1,65.4,65.7,65.9,66.1,66.3,66.6,66.8,66.0,67.3,67.4,67.6,67.7,67.9,68.0,68.1,68.3,68.4,68.6,68.7,68.8,68.9,69.0 +Moldova,58.5,58.96,59.42,59.85,60.27,60.68,61.07,61.46,61.84,62.22,62.61,62.99,63.38,63.77,64.14,64.48,64.78,65.03,65.23,65.39,65.48,65.55,65.58,65.6,65.6,65.57,65.52,65.47,65.41,65.4,65.48,65.68,65.98,66.38,66.83,67.29,67.69,67.98,68.16,68.2,67.4,67.6,67.4,65.8,65.4,66.1,67.9,68.5,68.4,68.6,69.2,69.6,69.9,70.2,69.5,69.8,70.0,70.4,70.6,70.5,72.3,72.4,73.3,73.6,73.9,74.2 +Mongolia,43.09,43.41,43.83,44.34,44.96,45.66,46.46,47.33,48.25,49.2,50.15,51.08,51.94,52.74,53.48,54.16,54.8,55.43,56.02,56.58,57.08,57.49,57.82,58.06,58.22,58.31,58.36,58.4,58.46,58.56,58.73,59.0,59.34,59.76,60.22,60.71,61.18,61.61,61.98,62.3,62.3,62.2,62.0,62.0,61.7,61.7,61.9,62.1,62.3,62.5,62.7,62.9,63.1,63.4,63.6,64.0,64.4,64.8,65.0,65.2,65.6,66.0,66.4,66.8,67.1,67.4 +Montenegro,59.32,59.59,59.91,60.31,60.78,61.3,61.87,62.5,63.17,63.86,64.54,65.21,65.86,66.47,67.05,67.62,68.19,68.78,69.36,69.94,70.48,70.99,71.41,71.78,72.07,72.33,72.55,72.75,72.95,73.16,73.35,73.52,73.68,73.83,73.96,74.08,74.21,74.35,74.47,74.6,74.4,74.2,73.9,73.7,73.5,73.4,73.3,73.1,73.0,73.3,73.5,74.0,74.5,74.8,75.0,75.2,75.6,76.0,76.3,76.5,76.7,76.8,76.9,77.1,77.2,77.3 +Morocco,45.84,46.21,46.58,46.98,47.39,47.81,48.25,48.7,49.17,49.64,50.11,50.6,51.09,51.58,52.06,52.54,53.0,53.46,53.91,54.34,54.77,55.19,55.62,56.08,56.56,57.11,57.72,58.39,59.13,59.93,60.77,61.63,62.49,63.33,64.14,64.91,65.66,66.38,67.06,67.7,68.1,68.4,68.6,69.1,69.5,70.0,70.4,70.8,71.1,71.5,71.8,72.0,72.3,72.5,72.7,72.9,73.1,73.3,73.5,73.7,73.9,74.1,74.3,74.4,74.6,74.8 +Mozambique,32.26,32.92,33.58,34.25,34.91,35.58,36.23,36.89,37.54,38.17,38.79,39.4,39.98,40.54,41.1,41.66,42.21,42.78,43.37,43.97,44.58,45.21,45.85,46.46,47.06,47.61,48.1,48.52,48.88,49.17,49.4,49.57,49.72,49.87,50.02,50.21,50.45,50.74,51.08,51.5,51.7,52.1,52.3,52.6,52.7,52.6,52.5,52.6,52.6,52.3,52.8,52.7,52.9,53.0,52.9,53.0,53.2,54.0,54.4,54.4,54.5,54.5,54.8,56.1,57.1,58.12 +Myanmar,33.8,35.24,36.53,37.69,38.71,39.6,40.36,41.03,41.65,42.25,42.9,43.64,44.47,45.4,46.4,47.39,48.31,49.11,49.78,50.31,50.72,51.09,51.44,51.78,52.15,52.54,52.93,53.31,53.69,54.07,54.44,54.8,55.16,55.52,55.87,56.23,56.58,56.93,57.26,57.6,57.8,58.1,58.4,58.8,59.0,59.4,59.7,60.1,60.4,60.8,61.3,61.7,62.3,62.8,63.4,64.0,64.6,59.4,65.6,66.0,66.4,66.8,67.2,67.6,68.0,68.4 +Namibia,40.72,41.49,42.23,42.96,43.69,44.39,45.09,45.76,46.42,47.07,47.7,48.31,48.9,49.48,50.05,50.61,51.17,51.71,52.26,52.81,53.36,53.91,54.44,54.98,55.51,56.04,56.56,57.07,57.57,58.06,58.54,59.01,59.45,59.87,60.27,60.65,61.0,61.3,61.54,61.7,61.9,62.0,62.0,61.5,60.5,59.3,58.1,56.7,55.4,54.0,53.4,52.7,52.4,52.5,53.1,54.9,57.5,59.1,60.3,61.4,62.6,63.6,63.9,64.1,64.2,64.3 +Nepal,35.53,36.0,36.48,36.96,37.43,37.9,38.38,38.85,39.32,39.8,40.26,40.74,41.21,41.67,42.14,42.6,43.05,43.51,43.97,44.43,44.91,45.41,45.92,46.47,47.05,47.64,48.28,48.94,49.63,50.32,51.06,51.81,52.57,53.36,54.17,54.98,55.83,56.68,57.53,58.4,59.1,60.0,60.2,61.0,61.7,62.5,63.4,63.9,64.6,65.2,65.9,65.9,66.8,67.0,67.4,67.8,68.1,68.4,68.7,69.0,69.3,69.7,69.9,70.2,69.7,69.2 +Netherlands,71.5,72.12,71.7,72.39,72.51,72.52,72.97,73.13,73.17,73.35,73.54,73.21,73.33,73.71,73.58,73.52,73.79,73.6,73.51,73.57,73.81,73.72,74.17,74.56,74.49,74.61,75.2,75.11,75.59,75.72,75.93,76.01,76.21,76.28,76.34,76.31,76.78,76.98,76.82,77.0,77.2,77.3,77.2,77.5,77.6,77.6,77.9,78.1,78.0,78.1,78.3,78.5,78.7,79.1,79.6,79.9,80.2,80.3,80.6,80.8,80.9,81.0,81.2,81.3,81.3,81.3 +Netherlands Antilles,58.96,60.02,61.0,61.89,62.7,63.43,64.08,64.65,65.15,65.6,65.99,66.34,66.67,67.0,67.33,67.67,68.03,68.41,68.81,69.22,69.63,70.05,70.45,70.84,71.21,71.58,71.94,72.29,72.64,72.96,73.27,73.56,73.8,74.02,74.19,74.33,74.42,74.49,74.52,74.54,74.53,74.52,74.5,74.49,74.48,74.48,74.5,74.53,74.57,74.65,74.76,74.91,75.09,75.3,75.53,75.76,75.98,76.18,76.36,76.52,76.65,76.77,76.89,77.01,77.14,77.27 +New Caledonia,49.51,50.34,51.16,51.96,52.74,53.5,54.25,54.98,55.69,56.38,57.06,57.72,58.36,58.99,59.6,60.19,60.77,61.33,61.89,62.42,62.95,63.46,63.95,64.44,64.91,65.37,65.81,66.25,66.68,67.09,67.5,67.89,68.28,68.65,69.02,69.37,69.72,70.05,70.38,70.7,71.01,71.31,71.6,71.89,72.16,72.43,72.7,72.95,73.21,73.46,73.7,73.94,74.17,74.4,74.62,74.84,75.05,75.26,75.47,75.67,75.88,76.09,76.31,76.52,76.74,76.96 +New Zealand,69.17,69.4,70.25,70.36,70.49,70.75,70.27,70.9,70.82,71.28,71.0,71.26,71.33,71.37,71.3,71.16,71.54,71.2,71.57,71.35,71.8,71.92,71.78,72.03,72.3,72.5,72.25,73.14,73.18,72.98,73.77,73.87,73.97,74.53,74.03,74.28,74.36,74.64,75.05,75.6,75.9,76.2,76.5,76.7,77.0,77.3,77.6,78.0,78.2,78.4,78.6,78.9,79.1,79.4,79.8,79.9,80.1,80.3,80.5,80.8,80.8,81.1,81.4,81.4,81.4,81.4 +Nicaragua,43.38,44.18,44.98,45.78,46.59,47.4,48.22,49.04,49.86,50.69,51.53,52.36,53.19,54.04,54.88,55.74,56.6,57.47,58.33,59.18,60.01,60.8,61.56,62.28,62.95,63.59,64.17,64.73,65.28,65.83,66.38,66.95,67.56,68.2,68.89,69.67,70.51,71.4,72.35,73.3,73.7,73.6,73.9,74.1,74.4,74.7,75.0,73.2,75.6,76.0,76.2,76.3,76.3,76.4,76.6,76.7,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.8,78.0,78.2 +Niger,35.61,35.72,35.83,35.95,36.08,36.22,36.37,36.51,36.67,36.82,36.97,37.1,37.24,37.36,37.49,37.61,37.73,37.88,38.05,38.24,38.45,38.69,38.95,39.25,39.57,39.97,40.4,40.9,41.44,42.0,42.58,43.13,43.66,44.15,44.63,45.09,45.57,46.07,46.62,47.2,47.9,48.2,48.6,49.1,49.5,50.2,50.6,51.2,51.8,52.4,52.9,53.7,54.4,55.2,55.9,56.6,57.3,58.0,58.6,59.2,59.6,60.0,60.4,60.7,61.0,61.3 +Nigeria,35.25,35.74,36.25,36.79,37.35,37.93,38.53,39.14,39.76,40.39,41.0,41.61,42.19,42.75,43.29,43.81,38.31,33.47,31.63,41.79,46.56,47.16,47.77,48.38,49.0,49.62,50.24,50.84,51.42,51.95,52.41,52.8,53.12,53.36,53.54,53.67,53.78,53.88,53.98,54.1,54.3,54.4,54.5,54.9,55.0,55.0,55.0,55.1,55.2,55.2,55.4,55.3,55.6,56.1,56.8,57.4,58.3,59.2,60.3,61.2,62.0,62.6,63.3,63.7,64.6,65.51 +North Korea,26.78,24.76,31.74,42.66,46.7,48.18,49.16,49.73,50.43,50.9,51.25,51.64,52.15,52.86,53.76,54.84,55.97,57.07,58.1,59.06,59.93,60.74,61.5,62.22,62.88,63.49,64.04,64.53,64.98,65.39,65.75,66.08,66.4,66.69,67.0,67.36,67.78,68.22,68.63,68.9,69.2,69.4,69.6,69.7,58.6,58.7,58.8,58.9,59.0,59.1,59.2,59.3,69.9,70.0,70.2,70.4,70.6,70.9,71.0,71.2,71.4,71.6,71.8,71.9,72.1,72.3 +Norway,72.58,72.72,73.2,73.28,73.5,73.55,73.5,73.5,73.63,73.66,73.67,73.55,73.2,73.7,73.83,74.11,74.18,74.07,73.78,74.19,74.3,74.46,74.56,74.88,74.93,75.17,75.51,75.54,75.54,75.8,76.0,76.13,76.19,76.36,76.07,76.21,76.07,76.17,76.52,76.6,77.0,77.1,77.5,77.7,77.9,78.2,78.3,78.3,78.5,78.6,78.9,79.1,79.5,79.8,80.2,80.4,80.6,80.8,80.8,81.1,81.1,81.6,81.6,82.0,82.0,82.0 +Oman,35.74,36.78,37.81,38.82,39.82,40.8,41.78,42.75,43.7,44.64,45.57,46.47,47.37,48.26,49.13,49.97,50.8,51.62,52.43,53.26,54.14,55.07,56.06,57.11,58.2,59.32,60.45,61.57,62.65,63.7,64.69,65.65,66.59,67.48,68.35,69.17,69.95,70.7,71.41,72.1,72.5,72.9,73.3,73.6,73.9,74.2,74.5,74.8,75.1,75.2,75.4,75.4,75.6,75.8,76.0,76.0,76.0,76.2,76.2,76.1,76.3,76.6,76.8,77.0,77.2,77.4 +Pakistan,36.85,38.07,39.26,40.42,41.56,42.67,43.75,44.8,45.81,46.79,47.73,48.63,49.47,50.27,51.01,51.7,52.34,52.95,53.52,54.06,54.6,55.12,55.64,56.16,56.68,57.17,57.63,58.05,58.44,58.79,59.13,59.45,59.77,60.09,60.43,60.77,61.11,61.45,61.78,62.1,62.2,62.1,62.0,61.9,61.8,61.9,61.8,62.0,62.1,62.3,62.5,62.6,62.8,63.1,62.2,63.7,63.8,64.1,64.3,64.5,64.9,65.1,65.4,65.6,65.9,66.2 +Panama,56.42,56.99,57.56,58.14,58.72,59.31,59.89,60.47,61.05,61.62,62.17,62.71,63.22,63.72,64.21,64.7,65.18,65.65,66.15,66.66,67.18,67.72,68.26,68.81,69.35,69.88,70.38,70.85,71.3,71.72,72.1,72.47,72.8,73.13,73.45,73.76,74.06,74.34,74.62,74.9,75.0,75.0,75.2,75.2,75.3,75.4,75.6,75.8,76.2,76.5,76.7,76.9,77.0,77.1,77.2,77.2,77.3,77.3,77.3,77.3,77.4,77.5,77.6,77.9,78.2,78.5 +Papua New Guinea,34.02,34.53,35.04,35.54,36.03,36.53,37.02,37.51,38.04,38.6,39.2,39.87,40.6,41.39,42.22,43.07,43.92,44.74,45.53,46.27,46.97,47.63,48.27,48.9,49.54,50.21,50.91,51.65,52.4,53.11,53.74,54.26,54.65,54.92,55.08,55.19,55.3,55.47,55.7,56.0,56.0,56.2,56.4,56.7,56.9,57.0,57.2,56.5,57.4,57.5,57.6,57.6,57.7,57.7,57.9,58.0,58.2,58.6,58.8,59.1,59.4,59.7,60.2,60.5,60.9,61.3 +Paraguay,64.04,64.16,64.33,64.52,64.76,65.03,65.33,65.65,66.0,66.35,66.7,67.03,67.33,67.61,67.87,68.11,68.37,68.63,68.9,69.2,69.49,69.78,70.06,70.32,70.57,70.81,71.04,71.28,71.51,71.73,71.97,72.19,72.41,72.64,72.87,73.11,73.36,73.62,73.91,74.2,74.2,74.1,74.1,74.0,74.1,74.1,74.2,74.2,74.3,74.2,74.2,74.1,74.1,73.8,74.0,74.0,74.0,74.0,74.0,74.0,74.0,74.1,74.1,74.3,74.4,74.5 +Peru,43.99,44.43,44.91,45.41,45.95,46.51,47.1,47.72,48.34,48.95,49.56,50.14,50.7,51.25,51.79,52.38,53.03,53.74,54.52,55.36,56.2,57.04,57.85,58.6,59.31,59.99,60.63,61.28,61.93,62.59,63.25,63.9,64.55,65.18,65.8,66.41,66.99,67.57,68.14,68.7,69.2,69.5,70.0,70.5,71.1,71.7,72.4,73.1,73.9,74.6,75.2,75.7,76.2,76.7,77.2,77.7,77.9,78.2,78.2,78.4,78.5,78.7,79.1,79.3,79.5,79.7 +Philippines,55.43,55.83,56.23,56.61,56.99,57.36,57.74,58.11,58.46,58.82,59.17,59.53,59.87,60.21,60.56,60.91,61.26,61.6,61.94,62.26,62.54,62.77,62.95,63.1,63.21,63.32,63.44,63.6,63.81,64.06,64.37,64.74,65.13,65.53,65.95,66.35,66.72,67.05,67.34,67.6,67.9,68.2,68.3,68.6,68.8,68.9,69.0,69.0,69.2,69.1,69.0,69.0,69.1,69.1,69.1,69.2,69.7,69.8,69.9,70.1,70.2,70.3,70.3,70.7,71.0,71.3 +Poland,59.68,60.87,61.96,62.97,63.9,64.74,65.5,65.97,65.59,67.92,68.04,67.71,68.64,68.87,69.58,69.99,69.69,70.33,69.83,69.96,69.76,70.95,70.95,71.46,70.88,70.88,70.78,70.71,71.05,70.4,71.38,71.45,71.29,71.03,70.78,71.07,71.12,71.49,71.25,70.9,70.7,71.1,71.7,71.7,71.9,72.4,72.7,73.0,73.1,73.8,74.2,74.6,74.9,75.0,75.1,75.2,75.2,75.4,75.7,76.2,76.5,76.7,77.3,77.4,77.6,77.8 +Portugal,58.71,59.81,61.11,62.25,61.42,61.22,61.49,63.79,62.97,64.23,62.85,64.37,65.0,65.22,66.17,65.67,66.57,66.88,66.49,67.14,66.91,69.23,68.63,69.18,68.9,69.12,70.37,70.83,71.64,71.71,71.9,72.73,72.65,72.94,73.22,73.61,74.0,74.02,74.58,74.2,74.2,74.6,74.7,75.5,75.5,75.5,75.8,76.1,76.4,76.8,76.8,77.3,77.6,78.2,78.4,79.0,79.2,79.4,79.6,79.9,80.2,80.4,80.7,80.7,80.8,80.9 +Puerto Rico,61.57,62.94,64.16,65.22,66.13,66.87,67.48,67.94,68.31,68.58,68.8,69.0,69.21,69.45,69.72,70.03,70.36,70.7,71.03,71.35,71.66,71.98,72.26,72.53,72.77,72.98,73.15,73.28,73.38,73.47,73.56,73.65,73.76,73.89,74.0,74.07,74.08,74.04,73.93,73.8,73.8,73.7,73.8,73.1,73.3,73.6,74.5,75.1,75.2,75.6,75.8,76.2,76.5,76.5,76.6,76.8,76.9,77.0,77.1,77.1,77.4,77.7,77.9,78.2,78.5,78.8 +Qatar,53.86,54.67,55.47,56.26,57.04,57.81,58.58,59.33,60.08,60.82,61.57,62.31,63.06,63.79,64.53,65.25,65.95,66.64,67.29,67.91,68.49,69.03,69.52,69.98,70.4,70.79,71.15,71.49,71.83,72.14,72.45,72.75,73.01,73.27,73.51,73.74,73.94,74.14,74.32,74.5,74.4,74.5,74.5,74.4,74.4,74.5,74.6,74.6,74.6,74.7,75.0,75.0,75.2,75.8,76.3,76.7,77.3,77.9,78.5,79.2,79.7,79.9,79.9,79.8,79.7,79.6 +Reunion,45.98,47.28,48.53,49.72,50.86,51.94,52.96,53.93,54.85,55.73,56.57,57.37,58.15,58.9,59.64,60.36,61.06,61.74,62.41,63.06,63.69,64.3,64.89,65.46,66.0,66.53,67.05,67.55,68.03,68.51,68.97,69.43,69.87,70.3,70.73,71.14,71.54,71.94,72.32,72.69,73.06,73.41,73.77,74.11,74.45,74.79,75.12,75.44,75.76,76.08,76.38,76.68,76.97,77.26,77.53,77.81,78.08,78.35,78.62,78.88,79.14,79.4,79.65,79.89,80.12,80.35 +Romania,61.13,61.07,61.19,61.47,61.93,62.54,63.29,64.14,65.04,65.92,66.7,67.32,67.74,67.96,68.02,67.98,67.95,68.01,68.16,68.41,68.73,69.06,69.34,69.58,69.75,69.87,69.95,70.01,70.06,70.1,70.12,70.11,70.1,70.08,70.05,70.02,70.0,69.98,69.99,70.0,70.5,70.0,69.8,69.5,69.4,69.1,69.1,69.8,70.6,71.1,71.1,71.2,71.6,72.0,72.4,72.8,73.3,73.2,73.3,73.7,74.5,74.7,74.9,75.1,75.2,75.3 +Russia,57.76,58.16,58.96,60.96,63.35,64.85,63.95,66.84,67.59,68.61,68.85,68.51,68.98,69.77,69.36,69.43,69.21,69.17,68.65,68.76,69.02,68.92,68.89,68.88,68.24,67.98,67.85,67.89,67.61,67.57,67.79,68.25,68.01,67.53,68.19,69.8,69.81,69.66,69.57,69.2,69.1,68.0,65.2,63.8,64.4,65.7,67.0,67.2,65.9,65.1,65.1,64.9,64.7,65.1,65.1,66.7,67.7,67.9,68.8,68.9,69.8,70.4,70.8,70.9,71.0,71.1 +Rwanda,39.99,40.32,40.66,41.0,41.34,41.69,42.03,42.38,42.73,43.07,43.41,43.74,44.05,44.35,44.62,44.85,45.07,45.27,45.44,45.58,45.71,45.81,45.91,46.01,46.13,46.31,46.54,46.81,47.12,47.46,47.88,48.32,48.69,48.88,49.15,49.42,49.69,49.96,50.23,50.5,49.3,48.0,46.7,13.2,43.8,44.6,44.0,45.6,47.2,49.2,51.0,53.5,55.5,57.6,59.6,61.6,63.1,64.1,64.3,65.1,65.3,65.5,65.6,65.7,65.9,66.1 +Samoa,46.08,46.69,47.3,47.9,48.5,49.09,49.69,50.28,50.87,51.45,52.04,52.62,53.21,53.8,54.39,54.98,55.57,56.15,56.75,57.33,57.92,58.5,59.09,59.67,60.26,60.84,61.44,62.02,62.62,63.2,63.79,64.36,64.94,65.51,66.1,66.67,67.27,67.87,68.49,69.1,69.1,69.5,69.7,69.8,70.0,70.2,70.4,70.6,70.7,70.8,71.0,71.2,71.4,71.6,71.8,72.0,72.1,72.3,70.4,72.6,72.7,72.7,73.0,73.1,73.2,73.3 +Sao Tome and Principe,46.1,46.54,47.01,47.52,48.05,48.6,49.18,49.77,50.38,51.01,51.62,52.21,52.79,53.36,53.92,54.47,55.03,55.6,56.19,56.81,57.47,58.13,58.81,59.47,60.09,60.63,61.08,61.42,61.67,61.83,61.93,62.02,62.12,62.24,62.4,62.59,62.79,63.0,63.2,63.4,63.5,63.6,63.7,64.0,64.1,63.9,63.9,64.0,64.4,64.6,64.9,65.0,65.3,65.4,65.5,65.7,65.7,66.0,66.7,66.9,67.2,67.4,67.6,67.8,68.0,68.2 +Saudi Arabia,42.31,42.89,43.47,44.05,44.64,45.23,45.82,46.42,47.02,47.62,48.22,48.84,49.48,50.15,50.88,51.68,52.55,53.51,54.55,55.65,56.82,58.04,59.26,60.48,61.67,62.83,63.95,65.01,66.03,66.99,67.89,68.74,69.54,70.3,71.01,71.66,72.28,72.85,73.39,73.9,74.3,74.6,74.9,75.1,75.5,75.8,76.0,76.3,76.6,76.8,77.1,77.2,77.4,77.5,77.8,77.9,78.2,78.3,78.5,78.7,78.9,79.2,79.3,79.4,79.5,79.6 +Senegal,34.89,35.39,35.88,36.34,36.78,37.19,37.57,37.93,38.23,38.46,38.63,38.72,38.76,38.74,38.71,38.7,38.74,38.9,39.17,39.59,40.18,40.94,41.85,42.85,43.94,45.07,46.21,47.33,48.4,49.42,50.43,51.44,52.47,53.48,54.45,55.36,56.17,56.86,57.41,57.8,58.0,58.0,58.2,58.2,58.4,58.8,58.9,59.1,59.2,59.7,60.2,60.4,61.3,61.7,62.2,62.5,63.0,63.5,63.9,64.2,64.4,64.6,64.8,65.0,65.3,65.6 +Serbia,58.63,59.11,59.61,60.12,60.63,61.15,61.69,62.23,62.78,63.33,63.88,64.44,64.99,65.53,66.06,66.56,67.05,67.51,67.94,68.34,68.7,69.03,69.32,69.59,69.82,70.03,70.21,70.37,70.53,70.68,70.84,71.01,71.19,71.38,71.58,71.78,71.99,72.17,72.35,72.5,71.4,72.4,72.3,72.1,72.0,71.9,72.1,71.5,71.0,72.1,72.4,72.5,72.7,72.9,73.2,73.6,74.0,74.3,74.6,74.8,75.1,75.4,75.7,75.9,76.2,76.5 +Seychelles,57.55,57.43,57.45,57.57,57.82,58.18,58.65,59.19,59.8,60.42,61.03,61.59,62.08,62.47,62.81,63.11,63.43,63.78,64.18,64.62,65.11,65.59,66.06,66.49,66.9,67.26,67.59,67.89,68.16,68.4,68.63,68.83,69.02,69.17,69.3,69.36,69.37,69.31,69.22,69.1,69.1,69.2,69.3,69.6,69.8,69.9,70.1,70.4,70.7,70.9,71.1,71.3,71.5,71.7,72.0,72.3,72.6,72.9,73.0,73.1,73.4,73.7,73.8,74.0,74.1,74.2 +Sierra Leone,31.66,32.13,32.62,33.1,33.6,34.09,34.59,35.08,35.58,36.07,36.57,37.06,37.57,38.1,38.7,39.38,40.18,41.08,42.08,43.15,44.28,45.39,46.48,47.5,48.45,49.31,50.11,50.83,51.49,52.04,52.5,52.83,53.06,53.16,53.14,52.98,52.72,52.36,51.98,51.6,51.4,51.9,52.1,51.6,50.9,51.9,51.3,49.7,49.2,51.5,51.8,51.6,51.7,52.0,52.3,52.7,53.0,53.6,54.2,55.0,55.6,56.4,57.1,55.2,57.1,59.07 +Singapore,58.62,59.54,60.41,61.24,62.01,62.73,63.39,64.01,64.54,65.02,65.41,65.72,65.97,66.16,66.31,66.46,66.63,66.84,67.09,67.4,67.75,68.12,68.5,68.88,69.26,69.62,69.98,70.34,70.68,71.04,71.39,71.74,72.11,72.49,72.87,73.27,73.68,74.1,74.51,74.9,75.6,76.0,76.2,76.3,76.4,76.7,77.2,77.6,78.0,78.3,78.6,78.9,79.3,79.8,80.0,80.2,80.4,80.6,81.0,81.3,81.5,81.6,81.7,81.9,82.0,82.1 +Slovak Republic,61.35,64.4,65.7,66.76,67.89,68.42,67.51,69.41,69.09,70.42,70.86,70.4,70.79,71.17,70.39,70.53,71.07,70.6,69.91,69.84,69.99,70.46,70.16,70.33,70.45,70.62,70.58,70.59,70.92,70.58,70.82,70.94,70.64,70.88,70.89,71.07,71.24,71.32,71.12,71.0,71.1,71.4,71.9,72.3,72.4,72.8,72.8,72.8,73.0,73.3,73.6,73.8,73.9,74.2,74.3,74.5,74.6,74.9,75.2,75.7,76.1,76.5,77.0,77.4,77.6,77.8 +Slovenia,64.71,65.28,65.83,66.34,66.81,67.25,67.66,68.02,68.34,68.62,68.82,68.98,69.08,69.12,69.14,69.14,69.14,69.17,69.23,69.32,69.47,69.66,69.86,70.09,70.32,70.51,70.66,70.77,70.85,70.89,70.94,71.03,70.74,71.2,71.63,72.17,72.1,72.75,73.19,73.7,73.6,73.8,73.9,74.2,74.6,75.0,75.2,75.4,75.7,76.1,76.3,76.6,76.8,77.2,77.6,77.9,78.2,78.7,79.1,79.5,79.9,80.1,80.3,80.8,80.9,81.0 +Solomon Islands,45.39,45.97,46.53,47.11,47.68,48.26,48.83,49.41,49.98,50.55,51.12,51.69,52.27,52.84,53.42,54.0,54.58,55.16,55.74,56.31,56.91,57.52,58.13,58.74,59.33,59.9,60.43,60.89,61.27,61.53,61.59,61.46,61.16,60.74,60.26,59.84,59.58,59.52,59.7,60.1,60.0,60.4,60.6,60.9,61.1,61.4,61.5,61.6,61.7,61.7,61.7,61.7,61.7,61.7,61.8,61.9,61.9,62.3,62.4,62.7,63.0,63.3,63.5,63.6,64.0,64.4 +Somalia,34.13,34.6,35.07,35.54,36.01,36.47,36.94,37.41,37.87,38.34,38.8,39.26,39.74,40.21,40.68,41.14,41.61,42.08,42.54,42.99,43.44,43.9,44.35,44.8,45.24,45.7,46.15,46.6,47.03,47.46,47.88,48.28,48.65,48.98,49.24,49.36,49.34,49.19,48.98,48.8,47.4,48.4,49.7,49.7,49.9,49.9,49.6,50.3,50.4,50.7,50.9,51.1,51.5,51.6,52.1,52.2,52.4,52.6,52.8,51.6,52.0,53.4,54.1,54.3,54.2,54.1 +South Africa,43.92,44.67,45.37,46.03,46.63,47.19,47.71,48.17,48.6,49.01,49.4,49.78,50.14,50.52,50.91,51.3,51.68,52.04,52.41,52.77,53.11,53.44,53.77,54.11,54.47,54.86,55.3,55.77,56.29,56.85,57.44,58.04,58.64,59.22,59.78,60.32,60.83,61.29,61.69,62.0,62.5,62.4,63.0,62.8,62.7,61.6,60.0,58.9,57.9,56.4,55.9,54.8,53.7,52.8,52.7,52.5,53.0,53.4,53.9,54.9,56.6,59.0,60.7,61.2,61.3,61.4 +South Korea,40.52,40.02,45.02,48.02,49.55,50.22,50.9,51.6,52.3,53.02,53.75,54.51,55.27,56.04,56.84,57.67,58.54,59.44,60.35,61.22,62.02,62.73,63.34,63.84,64.26,64.62,64.95,65.31,65.7,66.15,66.66,67.21,67.78,68.37,68.98,69.58,70.18,70.75,71.29,71.8,72.2,72.7,73.1,73.6,74.0,74.5,74.9,75.4,75.8,76.3,76.7,77.1,77.7,78.2,78.7,79.1,79.4,79.8,80.1,80.4,80.6,80.7,80.9,80.9,81.0,81.1 +South Sudan,28.6,29.37,30.11,30.82,31.51,32.17,32.81,33.42,34.02,34.61,35.18,35.75,36.32,36.9,37.48,38.04,38.6,39.15,39.68,40.21,40.75,41.29,41.84,42.39,42.93,43.43,43.87,44.26,44.61,44.93,45.25,45.6,46.01,46.5,47.06,47.72,48.45,49.23,50.05,50.9,51.0,51.6,51.9,52.3,52.7,53.1,53.4,53.8,54.1,54.4,54.7,54.9,55.0,55.2,55.3,55.4,55.5,55.6,55.8,56.0,55.9,56.0,56.0,56.1,56.1,56.1 +Spain,61.5,64.92,65.79,66.98,66.75,66.79,66.63,68.82,68.74,69.23,69.62,69.65,69.81,70.54,70.95,71.2,71.39,71.68,71.21,72.19,71.79,73.0,72.78,73.16,73.49,73.81,74.32,74.51,75.05,75.53,75.67,76.22,76.0,76.38,76.34,76.59,76.82,76.82,76.89,76.9,77.0,77.4,77.6,77.8,77.9,78.1,78.6,78.8,78.8,79.2,79.5,79.6,79.6,80.0,80.3,80.7,80.8,81.1,81.5,81.8,82.0,82.2,82.5,82.5,82.6,82.7 +Sri Lanka,53.25,54.34,55.32,56.22,57.01,57.71,58.32,58.86,59.32,59.76,60.18,60.61,61.06,61.55,62.07,62.62,63.17,63.7,64.21,64.69,65.15,65.56,65.97,66.36,66.76,67.17,67.6,68.06,68.52,68.97,69.35,69.64,69.83,69.93,69.97,70.0,70.05,70.16,70.32,70.5,71.3,72.0,72.9,72.8,71.7,71.3,71.4,72.0,72.4,72.4,73.3,73.7,74.0,69.4,73.9,73.9,74.4,74.0,74.1,75.0,76.4,76.8,77.1,77.4,77.6,77.8 +St. Lucia,51.89,52.09,52.4,52.81,53.32,53.92,54.6,55.36,56.15,56.97,57.75,58.47,59.11,59.66,60.15,60.58,61.0,61.45,61.94,62.47,63.04,63.63,64.22,64.8,65.38,65.96,66.54,67.1,67.64,68.15,68.6,68.99,69.29,69.53,69.7,69.83,69.93,70.03,70.12,70.2,70.4,70.5,70.7,70.9,71.1,71.2,71.5,71.7,71.8,72.0,72.1,72.3,72.5,72.8,73.1,73.4,73.7,74.1,74.3,74.5,74.6,74.7,74.7,74.8,74.8,74.8 +St. Vincent and the Grenadines,50.11,50.59,51.19,51.89,52.69,53.58,54.57,55.63,56.73,57.85,58.96,59.99,60.93,61.75,62.46,63.06,63.58,64.04,64.46,64.84,65.16,65.43,65.64,65.82,65.99,66.16,66.36,66.61,66.88,67.19,67.52,67.84,68.15,68.43,68.68,68.92,69.14,69.34,69.53,69.7,69.7,69.7,69.7,69.6,69.6,69.4,69.7,69.8,69.6,69.1,69.7,69.7,70.1,70.2,70.4,70.6,70.8,70.9,71.1,71.1,71.0,71.1,70.8,71.1,71.2,71.3 +Sudan,44.44,45.08,45.71,46.31,46.88,47.45,48.0,48.53,49.04,49.54,50.04,50.52,50.99,51.47,51.94,52.42,52.9,53.36,53.82,54.26,54.68,55.06,55.41,55.73,56.0,56.23,56.44,56.63,56.8,56.95,57.11,57.27,57.44,57.61,57.81,58.01,58.23,58.44,58.67,58.9,59.2,59.4,59.5,60.2,60.5,60.6,60.8,61.2,62.0,62.4,62.8,63.3,63.5,63.7,64.6,64.9,65.3,65.5,65.7,66.1,66.3,66.7,66.9,67.2,67.5,67.8 +Suriname,55.52,56.24,56.93,57.57,58.16,58.72,59.24,59.71,60.16,60.58,61.0,61.41,61.81,62.23,62.65,63.07,63.49,63.89,64.28,64.66,65.0,65.32,65.62,65.91,66.19,66.47,66.76,67.07,67.39,67.71,68.02,68.31,68.57,68.79,68.98,69.15,69.29,69.43,69.57,69.7,69.9,69.8,69.7,69.8,70.1,70.2,70.2,70.1,69.9,69.7,69.5,69.4,69.5,69.7,69.9,70.0,70.1,70.2,70.5,70.7,71.0,71.3,71.6,71.8,72.0,72.2 +Swaziland,41.01,41.51,41.98,42.44,42.88,43.3,43.7,44.08,44.44,44.78,45.1,45.42,45.73,46.05,46.39,46.76,47.2,47.67,48.21,48.79,49.4,50.03,50.67,51.3,51.94,52.58,53.24,53.92,54.62,55.31,56.02,56.71,57.38,58.0,58.6,59.15,59.67,60.13,60.5,60.7,60.7,61.0,61.3,60.7,59.1,57.1,55.8,53.5,51.4,48.8,46.6,45.1,44.0,43.0,42.5,43.1,44.3,45.1,45.9,46.4,48.0,49.1,49.4,49.8,51.8,53.88 +Sweden,71.35,71.84,71.88,72.34,72.58,72.64,72.47,73.11,73.34,73.01,73.47,73.34,73.53,73.7,73.85,74.09,74.12,73.99,74.11,74.66,74.58,74.68,74.83,74.94,74.95,74.96,75.39,75.48,75.52,75.74,76.04,76.36,76.6,76.86,76.72,76.98,77.12,77.01,77.67,77.6,77.7,78.1,78.3,78.5,78.9,79.1,79.4,79.5,79.5,79.7,79.8,80.0,80.2,80.2,80.6,80.8,80.9,81.1,81.2,81.6,81.7,81.8,81.9,82.1,82.1,82.1 +Switzerland,68.72,69.63,69.55,70.02,70.1,70.23,70.58,71.32,71.48,71.46,71.79,71.35,71.34,72.23,72.36,72.5,72.8,72.75,72.76,73.18,73.3,73.82,74.12,74.47,74.86,74.98,75.43,75.39,75.69,75.69,75.92,76.26,76.27,76.87,76.99,77.17,77.47,77.49,77.68,77.5,77.6,77.9,78.3,78.4,78.5,79.1,79.2,79.5,79.8,79.8,80.2,80.4,80.6,81.0,81.3,81.5,81.7,82.0,82.0,82.3,82.6,82.7,82.8,82.9,83.0,83.1 +Syria,47.87,48.44,49.02,49.59,50.15,50.7,51.25,51.79,52.33,52.87,53.43,53.98,54.56,55.15,55.77,56.42,57.12,57.83,58.57,59.31,60.08,60.82,61.56,62.26,62.95,63.6,64.24,64.84,65.44,66.01,66.56,67.08,67.58,68.05,68.51,68.94,69.35,69.75,70.14,70.5,71.0,71.8,72.0,72.3,72.7,73.1,73.4,73.8,74.1,74.4,74.6,74.9,75.1,75.3,75.5,75.7,75.9,76.1,76.3,76.5,75.1,68.1,69.0,67.2,68.2,69.21 +Taiwan,55.11,58.51,60.31,62.01,62.41,62.51,62.41,64.21,64.22,64.42,64.92,65.22,66.02,66.72,67.42,67.42,67.52,67.62,68.62,68.67,69.08,69.38,69.43,69.8,70.05,70.41,70.58,71.15,71.28,71.53,71.63,72.14,72.12,72.79,72.98,73.11,73.4,73.22,73.53,73.8,74.2,74.3,74.5,74.6,74.6,74.7,75.2,75.4,75.3,76.0,76.4,76.9,77.3,77.3,77.4,77.8,78.2,78.4,78.7,79.0,78.8,79.0,79.3,79.4,79.5,79.6 +Tajikistan,52.94,53.4,53.87,54.33,54.79,55.26,55.72,56.17,56.64,57.1,57.57,58.03,58.51,58.98,59.45,59.9,60.34,60.77,61.17,61.55,61.9,62.23,62.53,62.81,63.08,63.34,63.57,63.81,64.04,64.28,64.53,64.8,65.07,65.34,65.55,65.69,65.73,65.67,65.5,65.3,65.3,62.6,64.2,64.1,64.1,63.3,64.8,64.9,65.5,65.8,66.1,66.5,66.9,67.5,68.0,68.7,69.2,69.6,70.0,70.1,70.1,70.8,71.4,71.9,72.4,72.9 +Tanzania,41.66,42.19,42.69,43.18,43.63,44.05,44.46,44.84,45.22,45.57,45.91,46.26,46.62,46.99,47.37,47.77,48.19,48.62,49.07,49.53,50.03,50.55,51.09,51.65,52.19,52.71,53.19,53.61,53.98,54.29,54.56,54.82,55.05,55.26,55.44,55.54,55.58,55.51,55.39,55.2,55.1,54.7,54.5,54.0,53.9,53.8,53.8,53.7,53.8,54.3,54.8,55.4,55.9,56.5,57.1,57.9,59.1,60.4,60.8,61.4,61.7,61.9,62.7,63.3,64.1,64.91 +Thailand,51.14,51.5,51.9,52.32,52.78,53.28,53.8,54.35,54.91,55.46,56.01,56.51,56.98,57.4,57.8,58.18,58.56,58.96,59.39,59.86,60.33,60.82,61.29,61.77,62.24,62.7,63.15,63.62,64.1,64.62,65.22,65.91,66.69,67.52,68.36,69.15,69.84,70.38,70.76,71.0,71.0,70.9,70.8,70.6,70.6,70.6,70.5,70.4,70.5,70.7,71.2,71.7,72.1,72.2,73.1,73.5,73.8,73.9,74.0,74.2,74.3,74.4,74.4,74.6,74.7,74.8 +Timor-Leste,31.41,32.12,32.83,33.54,34.24,34.94,35.64,36.34,37.04,37.74,38.45,39.15,39.85,40.55,41.29,42.12,43.02,43.96,44.86,45.56,45.8,45.51,44.71,43.49,42.12,40.94,40.25,40.27,41.01,42.45,44.42,46.61,48.76,50.76,52.51,53.99,55.26,56.41,57.47,58.5,59.2,59.9,60.6,61.3,61.8,62.3,62.4,62.8,62.3,60.7,64.4,65.3,65.7,66.5,67.5,68.5,69.2,69.9,70.4,70.8,71.3,71.7,72.0,72.3,72.4,72.5 +Togo,34.69,35.42,36.15,36.86,37.57,38.28,38.98,39.68,40.38,41.06,41.74,42.42,43.1,43.77,44.43,45.09,45.75,46.41,47.07,47.72,48.36,49.0,49.63,50.26,50.88,51.49,52.09,52.7,53.29,53.87,54.43,54.97,55.48,55.96,56.39,56.79,57.14,57.42,57.65,57.8,57.8,57.9,57.8,57.6,57.6,57.3,56.9,56.6,56.8,56.7,56.7,56.7,56.4,56.8,56.8,57.5,57.5,57.5,58.0,58.7,59.6,60.3,60.7,61.1,61.5,61.9 +Tonga,58.0,58.35,58.7,59.05,59.41,59.77,60.12,60.48,60.84,61.2,61.56,61.91,62.26,62.6,62.94,63.27,63.61,63.93,64.26,64.58,64.88,65.17,65.44,65.69,65.93,66.16,66.39,66.61,66.84,67.08,67.32,67.56,67.8,68.04,68.27,68.48,68.67,68.83,68.98,69.1,69.3,69.4,69.5,69.5,69.6,69.7,69.7,69.7,69.6,69.6,69.6,69.7,69.6,69.8,70.0,70.1,70.2,70.3,68.6,70.7,70.8,71.0,71.2,71.3,71.5,71.7 +Trinidad and Tobago,57.36,57.85,58.39,58.98,59.61,60.27,60.97,61.68,62.38,63.07,63.68,64.2,64.63,64.94,65.17,65.31,65.41,65.5,65.6,65.73,65.91,66.11,66.33,66.57,66.83,67.08,67.33,67.54,67.73,67.89,68.03,68.16,68.28,68.39,68.5,68.62,68.74,68.87,68.98,69.1,69.3,69.2,69.3,69.2,69.3,69.3,69.4,69.6,69.3,69.5,69.8,69.9,70.4,70.9,71.1,71.3,71.5,71.7,71.8,71.8,71.9,72.0,72.1,72.3,72.4,72.5 +Tunisia,39.03,39.33,39.68,40.06,40.48,40.94,41.43,41.97,42.56,43.2,43.89,44.65,45.47,46.35,47.31,48.33,49.42,50.56,51.74,52.94,54.16,55.37,56.57,57.75,58.9,60.03,61.15,62.27,63.36,64.41,65.4,66.31,67.13,67.87,68.56,69.2,69.83,70.48,71.13,71.8,72.0,72.2,72.2,72.5,72.9,73.4,73.9,74.3,74.7,75.0,75.3,75.5,75.7,76.0,76.2,76.4,76.6,76.8,77.0,77.1,77.2,77.4,77.5,77.6,77.6,77.6 +Turkey,41.2,41.68,42.2,42.76,43.35,43.99,44.67,45.38,46.13,46.91,47.71,48.52,49.35,50.15,50.96,51.74,52.48,53.21,53.91,54.59,55.27,55.96,56.65,57.36,58.08,58.81,59.55,60.29,61.03,61.74,62.45,63.15,63.82,64.49,65.15,65.76,66.37,66.96,67.53,68.1,68.5,69.2,69.7,69.8,70.0,70.6,71.2,72.0,71.5,73.8,74.4,75.1,75.1,75.8,76.2,76.7,77.4,77.8,78.5,78.8,78.8,79.1,78.8,79.1,79.2,79.3 +Turkmenistan,50.89,51.34,51.79,52.25,52.69,53.14,53.58,54.03,54.47,54.91,55.36,55.82,56.27,56.72,57.17,57.61,58.02,58.42,58.8,59.15,59.46,59.74,60.01,60.25,60.49,60.73,61.0,61.28,61.58,61.9,62.24,62.57,62.89,63.18,63.44,63.63,63.77,63.85,63.89,63.9,63.5,63.5,63.5,63.4,63.3,63.2,63.2,63.3,63.5,63.7,64.1,64.4,64.8,65.3,65.8,66.3,66.8,67.2,67.6,68.1,68.5,68.9,69.2,69.6,70.0,70.4 +Uganda,39.94,40.51,41.08,41.65,42.24,42.82,43.42,44.03,44.64,45.27,45.91,46.56,47.22,47.86,48.49,49.07,49.58,50.05,50.43,50.74,50.99,51.17,51.33,51.45,51.55,51.65,51.75,51.83,51.93,52.01,52.09,52.14,52.17,52.16,52.09,51.94,51.72,51.42,51.08,50.7,50.0,49.6,49.0,48.5,48.3,48.2,48.5,48.7,48.9,49.1,49.7,50.3,51.2,52.0,53.5,54.9,55.3,56.0,57.0,57.8,58.6,59.3,60.1,60.7,61.3,61.91 +Ukraine,62.2,62.94,63.63,64.42,66.26,67.15,67.19,68.88,69.26,70.88,71.15,70.56,71.18,71.97,71.4,71.66,71.25,71.33,70.7,70.59,70.81,70.57,70.75,70.63,69.96,70.01,69.68,69.63,69.36,69.33,69.36,69.51,69.48,69.17,69.47,70.82,70.61,70.49,70.43,70.0,69.4,68.8,68.3,67.5,66.5,66.7,67.3,68.1,67.7,67.3,67.5,67.5,67.7,67.5,67.1,67.9,67.6,67.8,69.6,70.5,71.1,71.2,71.3,71.3,71.5,71.7 +United Arab Emirates,41.83,43.04,44.22,45.37,46.5,47.62,48.7,49.77,50.82,51.85,52.89,53.91,54.91,55.9,56.87,57.81,58.7,59.54,60.33,61.08,61.78,62.45,63.09,63.7,64.3,64.87,65.41,65.93,66.43,66.91,67.36,67.79,68.2,68.6,68.98,69.34,69.68,70.0,70.31,70.6,70.8,71.1,71.3,71.6,71.9,72.1,72.4,72.8,73.0,73.3,73.6,73.8,74.1,74.4,75.2,75.7,75.6,75.6,75.6,75.6,75.5,75.5,75.4,75.4,75.4,75.4 +United Kingdom,68.26,69.55,69.82,70.19,70.15,70.42,70.54,70.71,70.81,71.02,70.77,70.84,70.74,71.53,71.52,71.43,72.06,71.68,71.64,71.89,72.2,71.98,72.18,72.38,72.65,72.62,73.11,73.04,73.14,73.57,73.9,74.03,74.28,74.66,74.51,74.78,75.12,75.23,75.36,75.7,76.0,76.2,76.3,76.6,76.7,76.9,77.1,77.3,77.5,77.8,78.0,78.2,78.4,78.7,79.0,79.2,79.5,79.7,80.0,80.2,80.5,80.7,80.8,80.9,81.0,81.1 +United States,68.22,68.44,68.79,69.58,69.63,69.71,69.49,69.76,69.98,69.91,70.32,70.21,70.04,70.33,70.41,70.43,70.76,70.42,70.66,70.92,71.24,71.34,71.54,72.08,72.68,72.99,73.38,73.58,74.03,73.93,74.36,74.65,74.71,74.81,74.79,74.87,75.01,75.02,75.1,75.4,75.5,75.8,75.7,75.8,75.9,76.3,76.6,76.8,76.9,76.9,76.9,77.1,77.3,77.6,77.6,77.8,78.1,78.3,78.5,78.8,78.9,79.0,79.1,79.1,79.1,79.1 +Uruguay,65.96,66.11,66.28,66.47,66.69,66.93,67.18,67.43,67.7,67.95,68.19,68.39,68.55,68.67,68.74,68.78,68.8,68.82,68.84,68.88,68.94,69.01,69.1,69.23,69.39,69.58,69.8,70.05,70.32,70.6,70.89,71.17,71.46,71.72,71.97,72.2,72.41,72.61,72.81,73.0,72.6,73.2,73.2,73.3,73.4,73.5,73.7,74.0,74.3,74.6,74.8,75.0,75.0,75.3,75.5,75.7,75.7,76.0,76.2,76.2,76.3,76.3,76.4,76.6,76.8,77.0 +Uzbekistan,55.32,55.78,56.23,56.68,57.13,57.58,58.02,58.46,58.91,59.35,59.8,60.25,60.7,61.15,61.59,62.02,62.43,62.83,63.2,63.54,63.86,64.14,64.4,64.64,64.87,65.11,65.37,65.64,65.94,66.25,66.59,66.93,67.27,67.59,67.85,68.02,68.09,68.06,67.96,67.8,67.6,67.3,67.0,66.7,66.6,66.7,66.9,67.1,67.4,67.6,67.8,67.9,68.1,68.3,68.5,68.8,69.2,69.6,69.9,70.2,70.6,70.9,71.2,71.5,71.8,72.1 +Vanuatu,40.79,41.36,41.94,42.51,43.09,43.67,44.24,44.82,45.4,45.97,46.55,47.14,47.71,48.29,48.87,49.44,50.01,50.56,51.12,51.67,52.21,52.77,53.33,53.89,54.46,55.05,55.64,56.24,56.83,57.41,57.97,58.5,58.98,59.44,59.87,60.27,60.67,61.07,61.48,61.9,62.0,62.1,62.2,62.2,62.3,62.4,61.2,62.5,62.0,62.5,62.5,62.5,62.5,62.6,62.7,62.9,63.2,63.4,63.6,63.9,64.1,64.4,64.6,64.7,64.9,65.1 +Venezuela,54.64,55.24,55.84,56.43,57.03,57.64,58.25,58.86,59.47,60.08,60.69,61.3,61.91,62.51,63.09,63.66,64.22,64.77,65.3,65.8,66.27,66.72,67.14,67.53,67.9,68.23,68.52,68.79,69.04,69.3,69.57,69.85,70.17,70.53,70.89,71.27,71.63,71.95,72.24,72.5,72.4,72.4,72.5,72.4,72.7,73.1,73.6,73.6,70.2,73.8,73.8,73.8,73.5,74.3,74.6,74.5,74.4,74.2,74.4,74.9,74.8,74.6,74.7,74.8,74.8,74.8 +Vietnam,51.98,52.81,53.6,54.36,55.11,55.83,56.52,57.19,57.86,58.52,59.17,59.82,60.42,60.95,61.32,61.36,61.06,60.45,59.63,58.78,58.17,58.0,58.35,59.23,60.54,62.07,63.58,64.86,65.84,66.49,66.86,67.1,67.3,67.51,67.77,68.07,68.38,68.68,69.0,69.3,69.6,69.8,70.1,70.3,70.6,70.9,71.1,71.5,71.7,72.0,72.2,72.5,72.8,73.0,73.3,73.5,73.8,74.1,74.3,74.5,74.7,74.9,75.0,75.2,75.4,75.6 +Virgin Islands (U.S.),57.9,58.87,59.74,60.54,61.25,61.88,62.44,62.93,63.36,63.75,64.11,64.46,64.82,65.2,65.6,66.02,66.44,66.87,67.29,67.71,68.12,68.53,68.94,69.34,69.73,70.11,70.46,70.8,71.12,71.43,71.74,72.05,72.38,72.71,73.06,73.41,73.75,74.09,74.42,74.73,75.04,75.34,75.64,75.94,76.23,76.52,76.8,77.07,77.33,77.57,77.8,78.0,78.19,78.36,78.52,78.69,78.86,79.05,79.25,79.46,79.69,79.92,80.15,80.38,80.6,80.82 +West Bank and Gaza,47.03,47.31,47.63,47.97,48.36,48.78,49.23,49.72,50.25,50.82,51.43,52.08,52.75,53.47,54.2,54.94,55.7,56.45,57.22,57.97,58.73,59.48,60.26,61.03,61.81,62.6,63.39,64.18,64.96,65.74,66.48,67.21,67.92,68.59,69.23,69.82,70.38,70.88,71.36,71.8,72.0,72.4,72.8,73.3,73.7,74.0,74.2,74.5,74.7,74.4,74.7,74.4,74.4,74.4,74.6,74.4,74.3,74.1,73.8,74.3,74.2,74.2,74.4,74.5,74.6,74.7 +Western Sahara,34.95,35.33,35.72,36.1,36.48,36.86,37.24,37.62,37.99,38.37,38.75,39.12,39.5,39.88,40.26,40.62,40.97,41.32,41.67,42.07,42.52,43.07,43.7,44.43,45.23,46.11,47.01,47.92,48.82,49.72,50.61,51.5,52.4,53.3,54.17,54.99,55.74,56.43,57.04,57.59,58.09,58.56,59.03,59.51,60.0,60.51,61.04,61.57,62.11,62.64,63.15,63.65,64.13,64.58,65.01,65.41,65.79,66.16,66.51,66.84,67.17,67.47,67.76,68.04,68.3,68.56 +Yemen,24.0,24.96,25.92,26.87,27.84,28.8,29.76,30.72,31.68,32.64,33.58,34.52,35.45,36.37,37.27,38.15,39.01,39.87,40.71,41.55,42.4,43.28,44.17,45.1,46.05,47.05,48.06,49.08,50.11,51.13,52.13,53.09,54.02,54.89,55.69,56.4,57.04,57.6,58.08,58.5,58.9,59.3,59.6,59.7,60.3,60.7,61.1,61.5,62.0,62.4,62.8,63.3,63.7,64.2,64.6,65.0,65.2,65.7,66.2,66.6,66.6,66.7,67.1,67.1,66.0,64.92 +Zambia,43.22,43.79,44.38,44.95,45.53,46.1,46.67,47.24,47.79,48.34,48.89,49.42,49.94,50.44,50.96,51.49,52.05,52.64,53.25,53.88,54.51,55.13,55.71,56.24,56.7,57.07,57.36,57.57,57.66,57.62,57.45,57.14,56.71,56.17,55.54,54.85,54.09,53.33,52.59,51.9,50.7,49.6,48.6,47.7,46.9,46.3,45.9,45.4,45.0,44.8,44.9,45.1,45.3,46.3,47.1,47.9,49.0,51.1,52.3,53.1,53.7,54.7,55.6,56.3,56.7,57.1 +Zimbabwe,48.75,49.25,49.75,50.25,50.73,51.22,51.71,52.17,52.64,53.11,53.55,53.99,54.42,54.83,55.25,55.65,56.04,56.43,56.83,57.22,57.63,58.05,58.47,58.92,59.41,59.94,60.53,61.17,61.82,62.48,63.13,63.73,64.23,64.63,64.86,64.9,64.74,64.39,63.81,63.0,62.7,61.4,59.8,58.2,56.0,54.4,52.8,50.9,49.3,47.9,47.0,45.9,45.3,44.7,45.1,45.5,46.4,47.3,48.0,49.1,51.6,54.2,55.7,57.0,59.3,61.69 diff --git a/previous_versions/v0.4.0/data/offshore.csv b/previous_versions/v0.4.0/data/offshore.csv new file mode 100755 index 000000000..5aa096441 --- /dev/null +++ b/previous_versions/v0.4.0/data/offshore.csv @@ -0,0 +1,828 @@ +college_grad,response +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +yes,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +no,opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +yes,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion +no,no opinion diff --git a/previous_versions/v0.4.0/data/zinc_tidy.csv b/previous_versions/v0.4.0/data/zinc_tidy.csv new file mode 100755 index 000000000..84856e658 --- /dev/null +++ b/previous_versions/v0.4.0/data/zinc_tidy.csv @@ -0,0 +1,21 @@ +loc_id,location,concentration +1.0,bottom,0.43 +1.0,surface,0.415 +2.0,bottom,0.266 +2.0,surface,0.238 +3.0,bottom,0.567 +3.0,surface,0.39 +4.0,bottom,0.531 +4.0,surface,0.41 +5.0,bottom,0.707 +5.0,surface,0.605 +6.0,bottom,0.716 +6.0,surface,0.609 +7.0,bottom,0.651 +7.0,surface,0.632 +8.0,bottom,0.589 +8.0,surface,0.523 +9.0,bottom,0.469 +9.0,surface,0.411 +10.0,bottom,0.723 +10.0,surface,0.612 diff --git a/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg b/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg new file mode 100755 index 000000000..92464e41e Binary files /dev/null and b/previous_versions/v0.4.0/images/Pie-I-have-Eaten.jpg differ diff --git a/previous_versions/v0.4.0/images/apps.jpg b/previous_versions/v0.4.0/images/apps.jpg new file mode 100755 index 000000000..7ef7ea59a Binary files /dev/null and b/previous_versions/v0.4.0/images/apps.jpg differ diff --git a/previous_versions/v0.4.0/images/coggle.png b/previous_versions/v0.4.0/images/coggle.png new file mode 100755 index 000000000..668944334 Binary files /dev/null and b/previous_versions/v0.4.0/images/coggle.png differ diff --git a/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png b/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png new file mode 100755 index 000000000..054694d97 Binary files /dev/null and b/previous_versions/v0.4.0/images/credit_card_balance_3D_scatterplot.png differ diff --git a/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png b/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png new file mode 100755 index 000000000..d7037938b Binary files /dev/null and b/previous_versions/v0.4.0/images/credit_card_balance_regression_plane.png differ diff --git a/previous_versions/v0.4.0/images/dashboard.jpg b/previous_versions/v0.4.0/images/dashboard.jpg new file mode 100755 index 000000000..57996bf17 Binary files /dev/null and b/previous_versions/v0.4.0/images/dashboard.jpg differ diff --git a/previous_versions/v0.4.0/images/datacamp.png b/previous_versions/v0.4.0/images/datacamp.png new file mode 100755 index 000000000..2911de3c4 Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png b/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png new file mode 100755 index 000000000..17fcfa240 Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_categorical_data.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png b/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png new file mode 100755 index 000000000..811743c26 Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_numerical_data.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png b/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png new file mode 100755 index 000000000..143c4cee8 Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_inference_for_regression.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_intermediate_R.png b/previous_versions/v0.4.0/images/datacamp_intermediate_R.png new file mode 100755 index 000000000..81b3cf7fb Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intermediate_R.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_R.png b/previous_versions/v0.4.0/images/datacamp_intro_to_R.png new file mode 100755 index 000000000..193664acd Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_R.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png b/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png new file mode 100755 index 000000000..8bd13337a Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_modeling.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png b/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png new file mode 100755 index 000000000..69ca9772a Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_intro_to_tidyverse.png differ diff --git a/previous_versions/v0.4.0/images/datacamp_working_with_data.png b/previous_versions/v0.4.0/images/datacamp_working_with_data.png new file mode 100755 index 000000000..eeb4ac861 Binary files /dev/null and b/previous_versions/v0.4.0/images/datacamp_working_with_data.png differ diff --git a/previous_versions/v0.4.0/images/engine.jpg b/previous_versions/v0.4.0/images/engine.jpg new file mode 100755 index 000000000..597512b49 Binary files /dev/null and b/previous_versions/v0.4.0/images/engine.jpg differ diff --git a/previous_versions/v0.4.0/images/errors.png b/previous_versions/v0.4.0/images/errors.png new file mode 100755 index 000000000..43c19d9a3 Binary files /dev/null and b/previous_versions/v0.4.0/images/errors.png differ diff --git a/previous_versions/v0.4.0/images/filter.png b/previous_versions/v0.4.0/images/filter.png new file mode 100755 index 000000000..8cd96205d Binary files /dev/null and b/previous_versions/v0.4.0/images/filter.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png new file mode 100755 index 000000000..e14558e96 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.009-cropped.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png new file mode 100755 index 000000000..0ce574917 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.010-cropped.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png b/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png new file mode 100755 index 000000000..7c8b6c6a7 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart.011-cropped.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png new file mode 100755 index 000000000..71139e1a1 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.002.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png new file mode 100755 index 000000000..e78715c4d Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.004.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png new file mode 100755 index 000000000..dce19ad70 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.005.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png new file mode 100755 index 000000000..964f0ae8f Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/flowchart/flowchart.006.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png b/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png new file mode 100755 index 000000000..83b51e66e Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/calculate.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png b/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png new file mode 100755 index 000000000..d9baa59f1 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ci_diagram.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/generate.png b/previous_versions/v0.4.0/images/flowcharts/infer/generate.png new file mode 100755 index 000000000..d81baa6ff Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/generate.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ht.png b/previous_versions/v0.4.0/images/flowcharts/infer/ht.png new file mode 100755 index 000000000..5effd3674 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ht.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png b/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png new file mode 100755 index 000000000..582bdad19 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/ht_diagram.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/specify.png b/previous_versions/v0.4.0/images/flowcharts/infer/specify.png new file mode 100755 index 000000000..7f68e18b7 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/specify.png differ diff --git a/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png b/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png new file mode 100755 index 000000000..895426ff3 Binary files /dev/null and b/previous_versions/v0.4.0/images/flowcharts/infer/visualize.png differ diff --git a/previous_versions/v0.4.0/images/group_summary.png b/previous_versions/v0.4.0/images/group_summary.png new file mode 100755 index 000000000..2f09b0f0f Binary files /dev/null and b/previous_versions/v0.4.0/images/group_summary.png differ diff --git a/previous_versions/v0.4.0/images/guess_the_correlation.png b/previous_versions/v0.4.0/images/guess_the_correlation.png new file mode 100755 index 000000000..fefdb23b1 Binary files /dev/null and b/previous_versions/v0.4.0/images/guess_the_correlation.png differ diff --git a/previous_versions/v0.4.0/images/ht.png b/previous_versions/v0.4.0/images/ht.png new file mode 100755 index 000000000..204422828 Binary files /dev/null and b/previous_versions/v0.4.0/images/ht.png differ diff --git a/previous_versions/v0.4.0/images/iphone.jpg b/previous_versions/v0.4.0/images/iphone.jpg new file mode 100755 index 000000000..cf3a222a0 Binary files /dev/null and b/previous_versions/v0.4.0/images/iphone.jpg differ diff --git a/previous_versions/v0.4.0/images/ismay.jpeg b/previous_versions/v0.4.0/images/ismay.jpeg new file mode 100755 index 000000000..f68ead9ed Binary files /dev/null and b/previous_versions/v0.4.0/images/ismay.jpeg differ diff --git a/previous_versions/v0.4.0/images/join-inner.png b/previous_versions/v0.4.0/images/join-inner.png new file mode 100755 index 000000000..18e996daa Binary files /dev/null and b/previous_versions/v0.4.0/images/join-inner.png differ diff --git a/previous_versions/v0.4.0/images/kim.jpeg b/previous_versions/v0.4.0/images/kim.jpeg new file mode 100755 index 000000000..524aff3d5 Binary files /dev/null and b/previous_versions/v0.4.0/images/kim.jpeg differ diff --git a/previous_versions/v0.4.0/images/logos/book_cover.png b/previous_versions/v0.4.0/images/logos/book_cover.png new file mode 100755 index 000000000..f20fd9ef6 Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/book_cover.png differ diff --git a/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png b/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png new file mode 100755 index 000000000..d28831d0b Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/favicons/apple-touch-icon.png differ diff --git a/previous_versions/v0.4.0/images/logos/favicons/favicon.ico b/previous_versions/v0.4.0/images/logos/favicons/favicon.ico new file mode 100755 index 000000000..bddb10a6f Binary files /dev/null and b/previous_versions/v0.4.0/images/logos/favicons/favicon.ico differ diff --git a/previous_versions/v0.4.0/images/mutate.png b/previous_versions/v0.4.0/images/mutate.png new file mode 100755 index 000000000..ab15762b8 Binary files /dev/null and b/previous_versions/v0.4.0/images/mutate.png differ diff --git a/previous_versions/v0.4.0/images/read_excel.png b/previous_versions/v0.4.0/images/read_excel.png new file mode 100755 index 000000000..e9467bb82 Binary files /dev/null and b/previous_versions/v0.4.0/images/read_excel.png differ diff --git a/previous_versions/v0.4.0/images/relational-nycflights.png b/previous_versions/v0.4.0/images/relational-nycflights.png new file mode 100755 index 000000000..10b04ce0f Binary files /dev/null and b/previous_versions/v0.4.0/images/relational-nycflights.png differ diff --git a/previous_versions/v0.4.0/images/rstudio.png b/previous_versions/v0.4.0/images/rstudio.png new file mode 100755 index 000000000..e1d286545 Binary files /dev/null and b/previous_versions/v0.4.0/images/rstudio.png differ diff --git a/previous_versions/v0.4.0/images/sampling/shovel_025.jpg b/previous_versions/v0.4.0/images/sampling/shovel_025.jpg new file mode 100755 index 000000000..df2c5e1d2 Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_025.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/shovel_050.jpg b/previous_versions/v0.4.0/images/sampling/shovel_050.jpg new file mode 100755 index 000000000..68787cf3d Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_050.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/shovel_100.jpg b/previous_versions/v0.4.0/images/sampling/shovel_100.jpg new file mode 100755 index 000000000..1cc70a70f Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/shovel_100.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg b/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg new file mode 100755 index 000000000..9a045406f Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_1_b.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg b/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg new file mode 100755 index 000000000..45b2791a9 Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_2_a.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg b/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg new file mode 100755 index 000000000..50ef8b56f Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_3_a.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg b/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg new file mode 100755 index 000000000..bd20120f3 Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling/tactile_3_c.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling_bowl_2.jpg b/previous_versions/v0.4.0/images/sampling_bowl_2.jpg new file mode 100755 index 000000000..48412bcfd Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling_bowl_2.jpg differ diff --git a/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg b/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg new file mode 100755 index 000000000..a38e5d063 Binary files /dev/null and b/previous_versions/v0.4.0/images/sampling_bowl_3_cropped.jpg differ diff --git a/previous_versions/v0.4.0/images/select.png b/previous_versions/v0.4.0/images/select.png new file mode 100755 index 000000000..a7329274a Binary files /dev/null and b/previous_versions/v0.4.0/images/select.png differ diff --git a/previous_versions/v0.4.0/images/sign-2408065_1920.png b/previous_versions/v0.4.0/images/sign-2408065_1920.png new file mode 100755 index 000000000..824dc86f0 Binary files /dev/null and b/previous_versions/v0.4.0/images/sign-2408065_1920.png differ diff --git a/previous_versions/v0.4.0/images/summarize1.png b/previous_versions/v0.4.0/images/summarize1.png new file mode 100755 index 000000000..e52e1d984 Binary files /dev/null and b/previous_versions/v0.4.0/images/summarize1.png differ diff --git a/previous_versions/v0.4.0/images/summary.png b/previous_versions/v0.4.0/images/summary.png new file mode 100755 index 000000000..86415225e Binary files /dev/null and b/previous_versions/v0.4.0/images/summary.png differ diff --git a/previous_versions/v0.4.0/images/tidy-1.png b/previous_versions/v0.4.0/images/tidy-1.png new file mode 100755 index 000000000..4287d74c6 Binary files /dev/null and b/previous_versions/v0.4.0/images/tidy-1.png differ diff --git a/previous_versions/v0.4.0/images/tidy1.png b/previous_versions/v0.4.0/images/tidy1.png new file mode 100755 index 000000000..88771ff58 Binary files /dev/null and b/previous_versions/v0.4.0/images/tidy1.png differ diff --git a/previous_versions/v0.4.0/index.html b/previous_versions/v0.4.0/index.html new file mode 100644 index 000000000..3fdbeb151 --- /dev/null +++ b/previous_versions/v0.4.0/index.html @@ -0,0 +1,941 @@ + + + + + + + + An Introduction to Statistical and Data Sciences via R + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      + +
      + +
      + +
      +
      + + +
      +
      + +
      + +ModernDive + + +
      +

      1 Introduction

      + +
      +
      +

      1.1 Important Note

      +

      This is a previous version (v0.4.0) of ModernDive and may be out of date. For the current version of ModernDive, please go to ModernDive.com.

      +
      +

      Drawing +      +Drawing

      +

      Help! I’m new to R and RStudio and I need to learn about them! However, I’m completely new to coding! What do I do? If you’re asking yourself this question, then you’ve come to the right place! Start with our Introduction for Students.

      +
        +
      • Are you an instructor hoping to use this book in your courses? Then click here for more information on how to teach with this book.
      • +
      • Are you looking to connect with and contribute to ModernDive? Then click here for information on how.
      • +
      • Are you curious about the publishing of this book? Then click here for more information on the open-source technology, in particular R Markdown and the bookdown package.
      • +
      +

      This is version 0.4.0 of ModernDive published on July 21, 2018. For previous versions of ModernDive, see Section 1.6.

      +
      +
      +
      +

      1.2 Introduction for students

      +

      This book assumes no prerequisites: no algebra, no calculus, and no prior programming/coding experience. This is intended to be a gentle introduction to the practice of analyzing data and answering questions using data the way data scientists, statisticians, data journalists, and other researchers would.

      +

      In Figure 1.1 we present a flowchart of what you’ll cover in this book. You’ll first get started with with data in Chapter 2, where you’ll learn about the difference between R and RStudio, start coding in R, understand what R packages are, and explore your first dataset: all domestic departure flights from a New York City airport in 2013. Then

      +
        +
      1. Data science: You’ll assemble your data science toolbox using tidyverse packages. In particular: +
          +
        • Ch.3: Visualizing data via the ggplot2 package.
        • +
        • Ch.4: Understanding the concept of “tidy” data as a standardized data input format for all packages in the tidyverse
        • +
        • Ch.5: Wrangling data via the dplyr package.
        • +
      2. +
      3. Data modeling: Using these data science tools and helper functions from the moderndive package, you’ll start performing data modeling. In particular: +
          +
        • Ch.6: Constructing basic regression models.
        • +
        • Ch.7: Constructing multiple regression models.
        • +
      4. +
      5. Statistical inference: Once again using your newly acquired data science tools, we’ll unpack statistical inference using the infer package. In particular: +
          +
        • Ch.8: Understanding the role that sampling variability plays in statistical inference using both tactile and virtual simulations of sampling from a “bowl” with an unknown proportion of red balls.
        • +
        • Ch.9: Building confidence intervals.
        • +
        • Ch.10: Conducting hypothesis tests.
        • +
      6. +
      7. Data modeling revisited: Armed with your new understanding of statistical inference, you’ll revisit and review the models you constructed in Ch.6 & Ch.6. In particular: +
          +
        • Ch.11: Interpreting both the statistical and practice significance of the results of the models.
        • +
      8. +
      +

      We’ll end with a discussion on what it means to “think with data” in Chapter 12 and present an example case study data analysis of house prices in Seattle.

      +
      +ModernDive Flowchart +

      +Figure 1.1: ModernDive Flowchart +

      +
      +
      +

      1.2.1 What you will learn from this book

      +

      We hope that by the end of this book, you’ll have learned

      +
        +
      1. How to use R to explore data.
        +
      2. +
      3. How to answer statistical questions using tools like confidence intervals and hypothesis tests.
      4. +
      5. How to effectively create “data stories” using these tools.
      6. +
      +

      What do we mean by data stories? We mean any analysis involving data that engages the reader in answering questions with careful visuals and thoughtful discussion, such as How strong is the relationship between per capita income and crime in Chicago neighborhoods? and How many f**ks does Quentin Tarantino give (as measured by the amount of swearing in his films)?. Further discussions on data stories can be found in this Think With Google article.

      +

      For other examples of data stories constructed by students like yourselves, look at the final projects for two courses that have previously used ModernDive:

      + +

      This book will help you develop your “data science toolbox”, including tools such as data visualization, data formatting, data wrangling, and data modeling using regression. With these tools, you’ll be able to perform the entirety of the “data/science pipeline” while building data communication skills (see Subsection 1.2.2 for more details).

      +

      In particular, this book will lean heavily on data visualization. In today’s world, we are bombarded with graphics that attempt to convey ideas. We will explore what makes a good graphic and what the standard ways are to convey relationships with data. You’ll also see the use of visualization to introduce concepts like mean, median, standard deviation, distributions, etc. In general, we’ll use visualization as a way of building almost all of the ideas in this book.

      +

      To impart the statistical lessons in this book, we have intentionally minimized the number of mathematical formulas used and instead have focused on developing a conceptual understanding via data visualization, statistical computing, and simulations. We hope this is a more intuitive experience than the way statistics has traditionally been taught in the past and how it is commonly perceived.

      +

      Finally, you’ll learn the importance of literate programming. By this we mean you’ll learn how to write code that is useful not just for a computer to execute but also for readers to understand exactly what your analysis is doing and how you did it. This is part of a greater effort to encourage reproducible research (see Subsection 1.2.3 for more details). Hal Abelson coined the phrase that we will follow throughout this book:

      +
      +

      “Programs must be written for people to read, and only incidentally for machines to execute.”

      +
      +

      We understand that there may be challenging moments as you learn to program. Both of us continue to struggle and find ourselves often using web searches to find answers and reach out to colleagues for help. In the long run though, we all can solve problems faster and more elegantly via programming. We wrote this book as our way to help you get started and you should know that there is a huge community of R users that are always happy to help everyone along as well. This community exists in particular on the internet on various forums and websites such as stackoverflow.com.

      +
      +
      +

      1.2.2 Data/science pipeline

      +

      You may think of statistics as just being a bunch of numbers. We commonly hear the phrase “statistician” when listening to broadcasts of sporting events. Statistics (in particular, data analysis), in addition to describing numbers like with baseball batting averages, plays a vital role in all of the sciences. You’ll commonly hear the phrase “statistically significant” thrown around in the media. You’ll see articles that say “Science now shows that chocolate is good for you.” Underpinning these claims is data analysis. By the end of this book, you’ll be able to better understand whether these claims should be trusted or whether we should be wary. Inside data analysis are many sub-fields that we will discuss throughout this book (though not necessarily in this order):

      +
        +
      • data collection
      • +
      • data wrangling
      • +
      • data visualization
      • +
      • data modeling
      • +
      • inference
      • +
      • correlation and regression
      • +
      • interpretation of results
      • +
      • data communication/storytelling
      • +
      +

      These sub-fields are summarized in what Grolemund and Wickham term the “Data/Science Pipeline” in Figure 1.2.

      +
      +Data/Science Pipeline +

      +Figure 1.2: Data/Science Pipeline +

      +
      +

      We will begin by digging into the gray Understand portion of the cycle with data visualization, then with a discussion on what is meant by tidy data and data wrangling, and then conclude by talking about interpreting and discussing the results of our models via Communication. These steps are vital to any statistical analysis. But why should you care about statistics? “Why did they make me take this class?”

      +

      There’s a reason so many fields require a statistics course. Scientific knowledge grows through an understanding of statistical significance and data analysis. You needn’t be intimidated by statistics. It’s not the beast that it used to be and, paired with computation, you’ll see how reproducible research in the sciences particularly increases scientific knowledge.

      +
      +
      +

      1.2.3 Reproducible research

      +
      +

      “The most important tool is the mindset, when starting, that the end product will be reproducible.” – Keith Baggerly

      +
      +

      Another goal of this book is to help readers understand the importance of reproducible analyses. The hope is to get readers into the habit of making their analyses reproducible from the very beginning. This means we’ll be trying to help you build new habits. This will take practice and be difficult at times. You’ll see just why it is so important for you to keep track of your code and well-document it to help yourself later and any potential collaborators as well.

      +

      Copying and pasting results from one program into a word processor is not the way that efficient and effective scientific research is conducted. It’s much more important for time to be spent on data collection and data analysis and not on copying and pasting plots back and forth across a variety of programs.

      +

      In a traditional analyses if an error was made with the original data, we’d need to step through the entire process again: recreate the plots and copy and paste all of the new plots and our statistical analysis into your document. This is error prone and a frustrating use of time. We’ll see how to use R Markdown to get away from this tedious activity so that we can spend more time doing science.

      +
      +

      “We are talking about computational reproducibility.” - Yihui Xie

      +
      +

      Reproducibility means a lot of things in terms of different scientific fields. Are experiments conducted in a way that another researcher could follow the steps and get similar results? In this book, we will focus on what is known as computational reproducibility. This refers to being able to pass all of one’s data analysis, data-sets, and conclusions to someone else and have them get exactly the same results on their machine. This allows for time to be spent interpreting results and considering assumptions instead of the more error prone way of starting from scratch or following a list of steps that may be different from machine to machine.

      + +
      +
      +

      1.2.4 Final note for students

      +

      At this point, if you are interested in instructor perspectives on this book, ways to contribute and collaborate, or the technical details of this book’s construction and publishing, then continue with the rest of the chapter below. Otherwise, let’s get started with R and RStudio in Chapter 2!

      +
      +
      +
      +
      +

      1.3 Introduction for instructors

      +

      This book is inspired by the following books:

      +
        +
      • “Mathematical Statistics with Resampling and R” (Chihara and Hesterberg 2011),
      • +
      • “OpenIntro: Intro Stat with Randomization and Simulation” (Diez, Barr, and Çetinkaya-Rundel 2014), and
      • +
      • “R for Data Science” (Grolemund and Wickham 2016).
      • +
      +

      The first book, while designed for upper-level undergraduates and graduate students, provides an excellent resource on how to use resampling to impart statistical concepts like sampling distributions using computation instead of large-sample approximations and other mathematical formulas. The last two books are free options to learning introductory statistics and data science, providing an alternative to the many traditionally expensive introductory statistics textbooks.

      +

      When looking over the large number of introductory statistics textbooks that currently exist, we found that there wasn’t one that incorporated many newly developed R packages directly into the text, in particular the many packages included in the tidyverse collection of packages, such as ggplot2, dplyr, tidyr, and broom. Additionally, there wasn’t an open-source and easily reproducible textbook available that exposed new learners all of three of the learning goals listed at the outset of Subsection 1.2.1.

      +
      +

      1.3.1 Who is this book for?

      +

      This book is intended for instructors of traditional introductory statistics classes using RStudio, either the desktop or server version, who would like to inject more data science topics into their syllabus. We assume that students taking the class will have no prior algebra, calculus, nor programming/coding experience.

      +

      Here are some principles and beliefs we kept in mind while writing this text. If you agree with them, this might be the book for you.

      +
        +
      1. Blur the lines between lecture and lab +
          +
        • With increased availability and accessibility of laptops and open-source non-proprietary statistical software, the strict dichotomy between lab and lecture can be loosened.
        • +
        • It’s much harder for students to understand the importance of using software if they only use it once a week or less. They forget the syntax in much the same way someone learning a foreign language forgets the rules. Frequent reinforcement is key.
        • +
      2. +
      3. Focus on the entire data/science research pipeline +
      4. +
      5. It’s all about the data +
          +
        • We leverage R packages for rich, real, and realistic data-sets that at the same time are easy-to-load into R, such as the nycflights13 and fivethirtyeight packages.
        • +
        • We believe that data visualization is a gateway drug for statistics and that the Grammar of Graphics as implemented in the ggplot2 package is the best way to impart such lessons. However, we often hear: “You can’t teach ggplot2 for data visualization in intro stats!” We, like David Robinson, are much more optimistic.
        • +
        • dplyr has made data wrangling much more accessible to novices, and hence much more interesting data-sets can be explored.
        • +
      6. +
      7. Use simulation/resampling to introduce statistical inference, not probability/mathematical formulas +
          +
        • Instead of using formulas, large-sample approximations, and probability tables, we teach statistical concepts using resampling-based inference.
        • +
        • This allows for a de-emphasis of traditional probability topics, freeing up room in the syllabus for other topics.
        • +
      8. +
      9. Don’t fence off students from the computation pool, throw them in! +
          +
        • Computing skills are essential to working with data in the 21st century. Given this fact, we feel that to shield students from computing is to ultimately do them a disservice.
        • +
        • We are not teaching a course on coding/programming per se, but rather just enough of the computational and algorithmic thinking necessary for data analysis.
        • +
      10. +
      11. Complete reproducibility and customizability +
          +
        • We are frustrated when textbooks give examples, but not the source code and the data itself. We give you the source code for all examples as well as the whole book!
        • +
        • Ultimately the best textbook is one you’ve written yourself. You know best your audience, their background, and their priorities. You know best your own style and the types of examples and problems you like best. Customization is the ultimate end. For more about how to make this book your own, see About this Book.
        • +
      12. +
      +
      +
      +
      +
      +

      1.4 DataCamp

      +

      +

      DataCamp is a browser-based interactive platform for learning data science, offering courses on a wide array of courses on data science, analytics, statistics, machine learning, and artificial intelligence, where each course is a combination of lectures and exercises that offer immediate feedback.

      +

      The following chapters of ModernDive roughly map to the following closely-integrated DataCamp courses that use the same R tools and often even the same datasets. By no means is this an exhaustive list of possible DataCamp courses that are relevant to the topics in this book, we recommend these ones in particular to supplement your ModernDive experience.

      +

      Click on the image for each course to access its webpage on datacamp.com. Instructors at accredited universities can sign their class up for a free academic licence at DataCamp For The Classroom, giving their students access to all premium courses for 6 months for free.

      + +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
      ChapterTopicDataCamp Courses
      2Basic R programming conceptsDrawing Drawing
      3 & 5Introductory data visualization and wranglingDrawing
      4 & 5Data “tidying” and intermediate data wranglingDrawing
      6 & 7Data modelling, basic regression, and multiple regressionDrawing
      9 & 10Statistical inference: confidence intervals and hypothesis testingDrawing Drawing
      11Inference for regressionDrawing
      +
      +
      +
      +

      1.5 Connect and contribute

      +

      If you would like to connect with ModernDive, check out the following links:

      + +

      If you would like to contribute to ModernDive, there are many ways! Let’s all work together to make this book as great as possible for as many students and instructors as possible!

      +
        +
      • Please let us know if you find any errors, typos, or areas from improvement on our GitHub issues page.
      • +
      • If you are familiar with GitHub and would like to contribute more, please see Section 1.6 below.
      • +
      +

      The authors would like to thank Nina Sonneborn, Kristin Bott, and the participants of our USCOTS 2017 workshop for their feedback and suggestions. A special thanks goes to Prof. Yana Weinstein, cognitive psychological scientist and co-founder of The Learning Scientists, for her extensive contributions.

      +
      +
      +
      +

      1.6 About this book

      +

      This book was written using RStudio’s bookdown package by Yihui Xie (Xie 2018). This package simplifies the publishing of books by having all content written in R Markdown. The bookdown/R Markdown source code for all versions of ModernDive is available on GitHub:

      + +

      Could this be a new paradigm for textbooks? Instead of the traditional model of textbook companies publishing updated editions of the textbook every few years, we apply a software design influenced model of publishing more easily updated versions. We can then leverage open-source communities of instructors and developers for ideas, tools, resources, and feedback. As such, we welcome your pull requests.

      +

      Finally, feel free to modify the book as you wish for your own needs, but please list the authors at the top of index.Rmd as “Chester Ismay, Albert Y. Kim, and YOU!”

      +
      +
      +
      +

      1.7 About the authors

      +

      Who we are!

      + + + + + + + + + + + + + +
      Chester IsmayAlbert Y. Kim
      DrawingDrawing
      + + + +
      +
      +
      + +
      +
      +
      + + +
      +
      + + + + + + + + + + + + + + diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png new file mode 100644 index 000000000..2ec3bb210 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png new file mode 100644 index 000000000..1c40650d5 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot4-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png new file mode 100644 index 000000000..75f1b3198 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/2numxplot5-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png new file mode 100644 index 000000000..e64a9d291 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/alpha-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png new file mode 100644 index 000000000..b0424a705 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/badbox-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png new file mode 100644 index 000000000..eac2ae532 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/bar-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png new file mode 100644 index 000000000..b461e4d8d Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/boxplot-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png new file mode 100644 index 000000000..c1b23c5d0 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/carrierpie-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png new file mode 100644 index 000000000..7533ef4c0 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot0b-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png new file mode 100644 index 000000000..59079169e Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png new file mode 100644 index 000000000..c373f0614 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot7-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png new file mode 100644 index 000000000..52ec8d8c9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/catxplot8-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png new file mode 100644 index 000000000..8e38890bd Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/comparing-sampling-distributions-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png new file mode 100644 index 000000000..bc07984da Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png new file mode 100644 index 000000000..19c3d3ce4 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/correlation2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png new file mode 100644 index 000000000..05fd2a2c6 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/credit-limit-quartiles-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png new file mode 100644 index 000000000..ae1e27ce4 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/facet-bar-vert-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png new file mode 100644 index 000000000..b15743ff0 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/facethistogram-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png new file mode 100644 index 000000000..5c819748e Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightsbar-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png new file mode 100644 index 000000000..6c974ac5c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/flightscol-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png new file mode 100644 index 000000000..945c2a767 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/gapminder-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png new file mode 100644 index 000000000..ac51b9a4c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/geombar-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png new file mode 100644 index 000000000..0e7da0c56 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/geomcol-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png new file mode 100644 index 000000000..f90e86717 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/guatline-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png new file mode 100644 index 000000000..21cc533e4 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/here-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png new file mode 100644 index 000000000..9aabd2385 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png new file mode 100644 index 000000000..c22482a88 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1a-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png new file mode 100644 index 000000000..ed39c46a3 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hist1b-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png new file mode 100644 index 000000000..5289175ca Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/hourlytemp-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png new file mode 100644 index 000000000..1aff0c4d4 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png new file mode 100644 index 000000000..87b9e1d29 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png new file mode 100644 index 000000000..f7be7b354 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-interaction-3-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png new file mode 100644 index 000000000..ae2dc50a1 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-price-parallel-slopes-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png new file mode 100644 index 000000000..e24767ad9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/house-prices-viz-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png new file mode 100644 index 000000000..89a84cdec Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png new file mode 100644 index 000000000..35d0b1184 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png new file mode 100644 index 000000000..1b56768ff Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/jitter-example-plot-2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png new file mode 100644 index 000000000..9d795fdd9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/lifeExp2007hist-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png new file mode 100644 index 000000000..fa5dcdc0c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-price-viz-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png new file mode 100644 index 000000000..b57a36c43 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/log10-size-viz-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png new file mode 100644 index 000000000..a9ba01c3c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png new file mode 100644 index 000000000..99c74f9f3 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model1residualshist-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png new file mode 100644 index 000000000..e4bf82faa Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png new file mode 100644 index 000000000..1dba66321 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/model3-residuals-hist-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png new file mode 100644 index 000000000..b1494c193 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png new file mode 100644 index 000000000..677284899 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png new file mode 100644 index 000000000..2b7c97f70 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/monthtempbox3-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png new file mode 100644 index 000000000..2f468ce06 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/movie-hist-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png new file mode 100644 index 000000000..9cbd57423 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/noalpha-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png new file mode 100644 index 000000000..07492d6b9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/nolayers-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png new file mode 100644 index 000000000..80f396a81 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png new file mode 100644 index 000000000..22004abf2 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxcatxplot2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png new file mode 100644 index 000000000..0c3d6fc59 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png new file mode 100644 index 000000000..257b8868f Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png new file mode 100644 index 000000000..f51ed808e Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot2-a-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png new file mode 100644 index 000000000..b16443c75 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot3-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png new file mode 100644 index 000000000..a02e9ea6d Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot4-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png new file mode 100644 index 000000000..cbd0db485 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot5-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png new file mode 100644 index 000000000..250e5aba0 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot6-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png new file mode 100644 index 000000000..9fe9d9ddb Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot7-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png new file mode 100644 index 000000000..b1bbe1929 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/numxplot9-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png new file mode 100644 index 000000000..354794075 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/pvaloneprop-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png new file mode 100644 index 000000000..6497fc940 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/qqplotmean-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png new file mode 100644 index 000000000..894639541 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual1-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png new file mode 100644 index 000000000..e4d3802a7 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/residual2-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png new file mode 100644 index 000000000..f223722fe Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-tactile-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png new file mode 100644 index 000000000..2893c6c1b Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png new file mode 100644 index 000000000..68d82d93f Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/samplingdistribution-virtual-1000-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png new file mode 100644 index 000000000..07c033577 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/stacked_bar-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png new file mode 100644 index 000000000..44a656cb9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-conf-int-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png new file mode 100644 index 000000000..45267ae95 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/tactile-vs-virtual-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png new file mode 100644 index 000000000..ca28c859b Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-121-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-211-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-211-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-198-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-212-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-212-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-199-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-244-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-244-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-226-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png new file mode 100644 index 000000000..2ec3bb210 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-240-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png new file mode 100644 index 000000000..2eff27fd8 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-26-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png new file mode 100644 index 000000000..4cd5d2522 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-27-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-294-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-294-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-275-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png new file mode 100644 index 000000000..b15d00000 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-279-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png new file mode 100644 index 000000000..52137cb8e Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-28-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png new file mode 100644 index 000000000..aba8f4003 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-282-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png new file mode 100644 index 000000000..610bc7ff9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-29-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png new file mode 100644 index 000000000..0585865d8 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-295-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png new file mode 100644 index 000000000..ecc2e03aa Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-297-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png new file mode 100644 index 000000000..95cb41aad Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-30-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png new file mode 100644 index 000000000..67f3cebba Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-301-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png new file mode 100644 index 000000000..59b11718b Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-304-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png new file mode 100644 index 000000000..b8923fa33 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-305-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-296-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-296-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-307-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png new file mode 100644 index 000000000..887295abf Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-311-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png new file mode 100644 index 000000000..361b4cf56 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-313-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-342-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-342-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-321-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png new file mode 100644 index 000000000..3f91edc4c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-322-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png new file mode 100644 index 000000000..f5ebba1b3 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-330-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png new file mode 100644 index 000000000..b66803a57 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-332-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png new file mode 100644 index 000000000..51c650433 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-347-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-380-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-380-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-357-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-387-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-387-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-366-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-392-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-369-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-391-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-391-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-370-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png new file mode 100644 index 000000000..fb7e59457 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-382-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png new file mode 100644 index 000000000..a7c90d982 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-383-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png new file mode 100644 index 000000000..38e224167 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-384-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png new file mode 100644 index 000000000..7b141a8b0 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-389-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-411-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-411-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-390-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png new file mode 100644 index 000000000..61c1fc57b Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-392-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png new file mode 100644 index 000000000..81bb3ed7e Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-393-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png new file mode 100644 index 000000000..2f1945041 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-395-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png new file mode 100644 index 000000000..4efe03a77 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-404-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-434-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-434-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-410-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png new file mode 100644 index 000000000..7f18858c5 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-439-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png new file mode 100644 index 000000000..c5de3a043 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-447-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png new file mode 100644 index 000000000..987c77048 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-448-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png new file mode 100644 index 000000000..23529eddf Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-452-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png new file mode 100644 index 000000000..e5fc89b14 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-455-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png new file mode 100644 index 000000000..9fba85a96 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-456-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png new file mode 100644 index 000000000..d8fdb2a55 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-460-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png new file mode 100644 index 000000000..8d5cbc32d Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-465-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png new file mode 100644 index 000000000..d58a15b80 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-466-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png new file mode 100644 index 000000000..7330b399c Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-470-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png new file mode 100644 index 000000000..70da38928 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-474-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png new file mode 100644 index 000000000..08e92540f Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-475-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png new file mode 100644 index 000000000..ca3cdd4bf Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-479-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-508-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-508-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-483-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png new file mode 100644 index 000000000..c561dced3 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-484-1.png differ diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png new file mode 100644 index 000000000..38bc86fb9 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-488-1.png differ diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-519-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-519-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-494-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-53-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-53-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-50-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-56-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-56-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-53-1.png diff --git a/docs/ismaykim_files/figure-html/unnamed-chunk-73-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png similarity index 100% rename from docs/ismaykim_files/figure-html/unnamed-chunk-73-1.png rename to previous_versions/v0.4.0/ismaykim_files/figure-html/unnamed-chunk-70-1.png diff --git a/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png b/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png new file mode 100644 index 000000000..a3dff48e8 Binary files /dev/null and b/previous_versions/v0.4.0/ismaykim_files/figure-html/virtual-conf-int-1.png differ diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js new file mode 100644 index 000000000..7d6121e1d --- /dev/null +++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph-combined.js @@ -0,0 +1,6 @@ +/*! @license Copyright 2014 Dan Vanderkam (danvdk@gmail.com) MIT-licensed (http://opensource.org/licenses/MIT) */ +!function(t){"use strict";for(var e,a,i={},r=function(){},n="memory".split(","),o="assert,clear,count,debug,dir,dirxml,error,exception,group,groupCollapsed,groupEnd,info,log,markTimeline,profile,profiles,profileEnd,show,table,time,timeEnd,timeline,timelineEnd,timeStamp,trace,warn".split(",");e=n.pop();)t[e]=t[e]||i;for(;a=o.pop();)t[a]=t[a]||r}(this.console=this.console||{}),function(){"use strict";CanvasRenderingContext2D.prototype.installPattern=function(t){if("undefined"!=typeof this.isPatternInstalled)throw"Must un-install old line pattern before installing a new one.";this.isPatternInstalled=!0;var e=[0,0],a=[],i=this.beginPath,r=this.lineTo,n=this.moveTo,o=this.stroke;this.uninstallPattern=function(){this.beginPath=i,this.lineTo=r,this.moveTo=n,this.stroke=o,this.uninstallPattern=void 0,this.isPatternInstalled=void 0},this.beginPath=function(){a=[],i.call(this)},this.moveTo=function(t,e){a.push([[t,e]]),n.call(this,t,e)},this.lineTo=function(t,e){var i=a[a.length-1];i.push([t,e])},this.stroke=function(){if(0===a.length)return void o.call(this);for(var i=0;if;){var x=t[v];f+=e[1]?e[1]:x,f>y?(e=[v,f-y],f=y):e=[(v+1)%t.length,0],v%2===0?r.call(this,f,0):n.call(this,f,0),v=(v+1)%t.length}this.restore(),l=g,h=d}o.call(this),a=[]}},CanvasRenderingContext2D.prototype.uninstallPattern=function(){throw"Must install a line pattern before uninstalling it."}}();var DygraphOptions=function(){return function(){"use strict";var t=function(t){this.dygraph_=t,this.yAxes_=[],this.xAxis_={},this.series_={},this.global_=this.dygraph_.attrs_,this.user_=this.dygraph_.user_attrs_||{},this.labels_=[],this.highlightSeries_=this.get("highlightSeriesOpts")||{},this.reparseSeries()};t.AXIS_STRING_MAPPINGS_={y:0,Y:0,y1:0,Y1:0,y2:1,Y2:1},t.axisToIndex_=function(e){if("string"==typeof e){if(t.AXIS_STRING_MAPPINGS_.hasOwnProperty(e))return t.AXIS_STRING_MAPPINGS_[e];throw"Unknown axis : "+e}if("number"==typeof e){if(0===e||1===e)return e;throw"Dygraphs only supports two y-axes, indexed from 0-1."}if(e)throw"Unknown axis : "+e;return 0},t.prototype.reparseSeries=function(){var e=this.get("labels");if(e){this.labels_=e.slice(1),this.yAxes_=[{series:[],options:{}}],this.xAxis_={options:{}},this.series_={};var a=!this.user_.series;if(a){for(var i=0,r=0;r1&&Dygraph.update(this.yAxes_[1].options,h.y2||{}),Dygraph.update(this.xAxis_.options,h.x||{})}},t.prototype.get=function(t){var e=this.getGlobalUser_(t);return null!==e?e:this.getGlobalDefault_(t)},t.prototype.getGlobalUser_=function(t){return this.user_.hasOwnProperty(t)?this.user_[t]:null},t.prototype.getGlobalDefault_=function(t){return this.global_.hasOwnProperty(t)?this.global_[t]:Dygraph.DEFAULT_ATTRS.hasOwnProperty(t)?Dygraph.DEFAULT_ATTRS[t]:null},t.prototype.getForAxis=function(t,e){var a,i;if("number"==typeof e)a=e,i=0===a?"y":"y2";else{if("y1"==e&&(e="y"),"y"==e)a=0;else if("y2"==e)a=1;else{if("x"!=e)throw"Unknown axis "+e;a=-1}i=e}var r=-1==a?this.xAxis_:this.yAxes_[a];if(r){var n=r.options;if(n.hasOwnProperty(t))return n[t]}if("x"!==e||"logscale"!==t){var o=this.getGlobalUser_(t);if(null!==o)return o}var s=Dygraph.DEFAULT_ATTRS.axes[i];return s.hasOwnProperty(t)?s[t]:this.getGlobalDefault_(t)},t.prototype.getForSeries=function(t,e){if(e===this.dygraph_.getHighlightSeries()&&this.highlightSeries_.hasOwnProperty(t))return this.highlightSeries_[t];if(!this.series_.hasOwnProperty(e))throw"Unknown series: "+e;var a=this.series_[e],i=a.options;return i.hasOwnProperty(t)?i[t]:this.getForAxis(t,a.yAxis)},t.prototype.numAxes=function(){return this.yAxes_.length},t.prototype.axisForSeries=function(t){return this.series_[t].yAxis},t.prototype.axisOptions=function(t){return this.yAxes_[t].options},t.prototype.seriesForAxis=function(t){return this.yAxes_[t].series},t.prototype.seriesNames=function(){return this.labels_};return t}()}(),DygraphLayout=function(){"use strict";var t=function(t){this.dygraph_=t,this.points=[],this.setNames=[],this.annotations=[],this.yAxes_=null,this.xTicks_=null,this.yTicks_=null};return t.prototype.addDataset=function(t,e){this.points.push(e),this.setNames.push(t)},t.prototype.getPlotArea=function(){return this.area_},t.prototype.computePlotArea=function(){var t={x:0,y:0};t.w=this.dygraph_.width_-t.x-this.dygraph_.getOption("rightGap"),t.h=this.dygraph_.height_;var e={chart_div:this.dygraph_.graphDiv,reserveSpaceLeft:function(e){var a={x:t.x,y:t.y,w:e,h:t.h};return t.x+=e,t.w-=e,a},reserveSpaceRight:function(e){var a={x:t.x+t.w-e,y:t.y,w:e,h:t.h};return t.w-=e,a},reserveSpaceTop:function(e){var a={x:t.x,y:t.y,w:t.w,h:e};return t.y+=e,t.h-=e,a},reserveSpaceBottom:function(e){var a={x:t.x,y:t.y+t.h-e,w:t.w,h:e};return t.h-=e,a},chartRect:function(){return{x:t.x,y:t.y,w:t.w,h:t.h}}};this.dygraph_.cascadeEvents_("layout",e),this.area_=t},t.prototype.setAnnotations=function(t){this.annotations=[];for(var e=this.dygraph_.getOption("xValueParser")||function(t){return t},a=0;a=0&&1>i&&this.xticks.push([i,a]);for(this.yticks=[],t=0;t0&&1>=i&&this.yticks.push([t,i,a])},t.prototype._evaluateAnnotations=function(){var t,e={};for(t=0;t=0;i--)a.childNodes[i].className==e&&a.removeChild(a.childNodes[i]);for(var r=document.bgColor,n=this.dygraph_.graphDiv;n!=document;){var o=n.currentStyle.backgroundColor;if(o&&"transparent"!=o){r=o;break}n=n.parentNode}var s=this.area;t({x:0,y:0,w:s.x,h:this.height}),t({x:s.x,y:0,w:this.width-s.x,h:s.y}),t({x:s.x+s.w,y:0,w:this.width-s.x-s.w,h:this.height}),t({x:s.x,y:s.y+s.h,w:this.width-s.x,h:this.height-s.h-s.y})},t._getIteratorPredicate=function(e){return e?t._predicateThatSkipsEmptyPoints:null},t._predicateThatSkipsEmptyPoints=function(t,e){return null!==t[e].yval},t._drawStyledLine=function(e,a,i,r,n,o,s){var l=e.dygraph,h=l.getBooleanOption("stepPlot",e.setName);Dygraph.isArrayLike(r)||(r=null);var p=l.getBooleanOption("drawGapEdgePoints",e.setName),g=e.points,d=e.setName,u=Dygraph.createIterator(g,0,g.length,t._getIteratorPredicate(l.getBooleanOption("connectSeparatedPoints",d))),c=r&&r.length>=2,y=e.drawingContext;y.save(),c&&y.installPattern(r);var _=t._drawSeries(e,u,i,s,n,p,h,a);t._drawPointsOnLine(e,_,o,a,s),c&&y.uninstallPattern(),y.restore()},t._drawSeries=function(t,e,a,i,r,n,o,s){var l,h,p=null,g=null,d=null,u=[],c=!0,y=t.drawingContext;y.beginPath(),y.strokeStyle=s,y.lineWidth=a;for(var _=e.array_,v=e.end_,f=e.predicate_,x=e.start_;v>x;x++){if(h=_[x],f){for(;v>x&&!f(_,x);)x++;if(x==v)break;h=_[x]}if(null===h.canvasy||h.canvasy!=h.canvasy)o&&null!==p&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),p=g=null;else{if(l=!1,n||!p){e.nextIdx_=x,e.next(),d=e.hasNext?e.peek.canvasy:null;var m=null===d||d!=d;l=!p&&m,n&&(!c&&!p||e.hasNext&&m)&&(l=!0)}null!==p?a&&(o&&(y.moveTo(p,g),y.lineTo(h.canvasx,g)),y.lineTo(h.canvasx,h.canvasy)):y.moveTo(h.canvasx,h.canvasy),(r||l)&&u.push([h.canvasx,h.canvasy,h.idx]),p=h.canvasx,g=h.canvasy}c=!1}return y.stroke(),u},t._drawPointsOnLine=function(t,e,a,i,r){for(var n=t.drawingContext,o=0;o0;a--){var i=e[a];if(i[0]==n){var o=e[a-1];o[1]==i[1]&&o[2]==i[2]&&e.splice(a,1)}}for(var a=0;a2&&!t){var s=0;e[0][0]==n&&s++;for(var l=null,h=null,a=s;ae[h][2]&&(h=a)}}var g=e[l],d=e[h];e.splice(s,e.length-s),h>l?(e.push(g),e.push(d)):l>h?(e.push(d),e.push(g)):e.push(g)}}},l=function(a){s(a);for(var l=0,h=e.length;h>l;l++){var p=e[l];p[0]==r?t.lineTo(p[1],p[2]):p[0]==n&&t.moveTo(p[1],p[2])}e.length&&(i=e[e.length-1][1]),o+=e.length,e=[]},h=function(t,r,n){var o=Math.round(r);if(null===a||o!=a){var s=a-i>1,h=o-a>1,p=s||h;l(p),a=o}e.push([t,r,n])};return{moveTo:function(t,e){h(n,t,e)},lineTo:function(t,e){h(r,t,e)},stroke:function(){l(!0),t.stroke()},fill:function(){l(!0),t.fill()},beginPath:function(){l(!0),t.beginPath()},closePath:function(){l(!0),t.closePath()},_count:function(){return o}}},t._fillPlotter=function(e){if(!e.singleSeriesName&&0===e.seriesIndex){for(var a=e.dygraph,i=a.getLabels().slice(1),r=i.length;r>=0;r--)a.visibility()[r]||i.splice(r,1);var n=function(){for(var t=0;t=0;r--){var n=i[r];t.lineTo(n[0],n[1])}},_=p-1;_>=0;_--){var v=e.drawingContext,f=i[_];if(a.getBooleanOption("fillGraph",f)){var x=a.getBooleanOption("stepPlot",f),m=u[_],D=a.axisPropertiesForSeries(f),w=1+D.minyval*D.yscale;0>w?w=0:w>1&&(w=1),w=l.h*w+l.y;var A,b=h[_],T=Dygraph.createIterator(b,0,b.length,t._getIteratorPredicate(a.getBooleanOption("connectSeparatedPoints",f))),E=0/0,C=[-1,-1],L=Dygraph.toRGB_(m),P="rgba("+L.r+","+L.g+","+L.b+","+g+")";v.fillStyle=P,v.beginPath();var S,O=!0;(b.length>2*a.width_||Dygraph.FORCE_FAST_PROXY)&&(v=t._fastCanvasProxy(v));for(var M,R=[];T.hasNext;)if(M=T.next(),Dygraph.isOK(M.y)||x){if(d){if(!O&&S==M.xval)continue;O=!1,S=M.xval,o=c[M.canvasx];var F;F=void 0===o?w:s?o[0]:o,A=[M.canvasy,F],x?-1===C[0]?c[M.canvasx]=[M.canvasy,w]:c[M.canvasx]=[M.canvasy,C[0]]:c[M.canvasx]=M.canvasy}else A=isNaN(M.canvasy)&&x?[l.y+l.h,w]:[M.canvasy,w];isNaN(E)?(v.moveTo(M.canvasx,A[1]),v.lineTo(M.canvasx,A[0])):(x?(v.lineTo(M.canvasx,C[0]),v.lineTo(M.canvasx,A[0])):v.lineTo(M.canvasx,A[0]),d&&(R.push([E,C[1]]),R.push(s&&o?[M.canvasx,o[1]]:[M.canvasx,A[1]]))),C=A,E=M.canvasx}else y(v,E,C[1],R),R=[],E=0/0,null===M.y_stacked||isNaN(M.y_stacked)||(c[M.canvasx]=l.h*M.y_stacked+l.y);s=x,A&&M&&(y(v,M.canvasx,A[1],R),R=[]),v.fill()}}}},t}(),Dygraph=function(){"use strict";var t=function(t,e,a,i){this.is_initial_draw_=!0,this.readyFns_=[],void 0!==i?(console.warn("Using deprecated four-argument dygraph constructor"),this.__old_init__(t,e,a,i)):this.__init__(t,e,a)};return t.NAME="Dygraph",t.VERSION="1.1.1",t.__repr__=function(){return"["+t.NAME+" "+t.VERSION+"]"},t.toString=function(){return t.__repr__()},t.DEFAULT_ROLL_PERIOD=1,t.DEFAULT_WIDTH=480,t.DEFAULT_HEIGHT=320,t.ANIMATION_STEPS=12,t.ANIMATION_DURATION=200,t.KMB_LABELS=["K","M","B","T","Q"],t.KMG2_BIG_LABELS=["k","M","G","T","P","E","Z","Y"],t.KMG2_SMALL_LABELS=["m","u","n","p","f","a","z","y"],t.numberValueFormatter=function(e,a){var i=a("sigFigs");if(null!==i)return t.floatFormat(e,i);var r,n=a("digitsAfterDecimal"),o=a("maxNumberWidth"),s=a("labelsKMB"),l=a("labelsKMG2");if(r=0!==e&&(Math.abs(e)>=Math.pow(10,o)||Math.abs(e)=0;c--,u/=h)if(d>=u){r=t.round_(e/u,n)+p[c];break}if(l){var y=String(e.toExponential()).split("e-");2===y.length&&y[1]>=3&&y[1]<=24&&(r=y[1]%3>0?t.round_(y[0]/t.pow(10,y[1]%3),n):Number(y[0]).toFixed(2),r+=g[Math.floor(y[1]/3)-1])}}return r},t.numberAxisLabelFormatter=function(e,a,i){return t.numberValueFormatter.call(this,e,i)},t.SHORT_MONTH_NAMES_=["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],t.dateAxisLabelFormatter=function(e,a,i){var r=i("labelsUTC"),n=r?t.DateAccessorsUTC:t.DateAccessorsLocal,o=n.getFullYear(e),s=n.getMonth(e),l=n.getDate(e),h=n.getHours(e),p=n.getMinutes(e),g=n.getSeconds(e),d=n.getSeconds(e);if(a>=t.DECADAL)return""+o;if(a>=t.MONTHLY)return t.SHORT_MONTH_NAMES_[s]+" "+o;var u=3600*h+60*p+g+.001*d;return 0===u||a>=t.DAILY?t.zeropad(l)+" "+t.SHORT_MONTH_NAMES_[s]:t.hmsString_(h,p,g)},t.dateAxisFormatter=t.dateAxisLabelFormatter,t.dateValueFormatter=function(e,a){return t.dateString_(e,a("labelsUTC"))},t.Plotters=DygraphCanvasRenderer._Plotters,t.DEFAULT_ATTRS={highlightCircleSize:3,highlightSeriesOpts:null,highlightSeriesBackgroundAlpha:.5,labelsDivWidth:250,labelsDivStyles:{},labelsSeparateLines:!1,labelsShowZeroValues:!0,labelsKMB:!1,labelsKMG2:!1,showLabelsOnHighlight:!0,digitsAfterDecimal:2,maxNumberWidth:6,sigFigs:null,strokeWidth:1,strokeBorderWidth:0,strokeBorderColor:"white",axisTickSize:3,axisLabelFontSize:14,rightGap:5,showRoller:!1,xValueParser:t.dateParser,delimiter:",",sigma:2,errorBars:!1,fractions:!1,wilsonInterval:!0,customBars:!1,fillGraph:!1,fillAlpha:.15,connectSeparatedPoints:!1,stackedGraph:!1,stackedGraphNaNFill:"all",hideOverlayOnMouseOut:!0,legend:"onmouseover",stepPlot:!1,avoidMinZero:!1,xRangePad:0,yRangePad:null,drawAxesAtZero:!1,titleHeight:28,xLabelHeight:18,yLabelWidth:18,drawXAxis:!0,drawYAxis:!0,axisLineColor:"black",axisLineWidth:.3,gridLineWidth:.3,axisLabelColor:"black",axisLabelWidth:50,drawYGrid:!0,drawXGrid:!0,gridLineColor:"rgb(128,128,128)",interactionModel:null,animatedZooms:!1,showRangeSelector:!1,rangeSelectorHeight:40,rangeSelectorPlotStrokeColor:"#808FAB",rangeSelectorPlotFillColor:"#A7B1C4",showInRangeSelector:null,plotter:[t.Plotters.fillPlotter,t.Plotters.errorPlotter,t.Plotters.linePlotter],plugins:[],axes:{x:{pixelsPerLabel:70,axisLabelWidth:60,axisLabelFormatter:t.dateAxisLabelFormatter,valueFormatter:t.dateValueFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawGrid:!0,drawAxis:!0,independentTicks:!0,ticker:null},y2:{axisLabelWidth:50,pixelsPerLabel:30,valueFormatter:t.numberValueFormatter,axisLabelFormatter:t.numberAxisLabelFormatter,drawAxis:!0,drawGrid:!1,independentTicks:!1,ticker:null}}},t.HORIZONTAL=1,t.VERTICAL=2,t.PLUGINS=[],t.addedAnnotationCSS=!1,t.prototype.__old_init__=function(e,a,i,r){if(null!==i){for(var n=["Date"],o=0;o=0;n--){var o=r[n][0],s=r[n][1];if(s.call(o,i),i.propagationStopped)break}return i.defaultPrevented},t.prototype.getPluginInstance_=function(t){for(var e=0;et||t>=this.axes_.length)return null;var e=this.axes_[t];return[e.computedValueRange[0],e.computedValueRange[1]]},t.prototype.yAxisRanges=function(){for(var t=[],e=0;et||t>this.rawData_.length?null:0>e||e>this.rawData_[t].length?null:this.rawData_[t][e]},t.prototype.createInterface_=function(){var e=this.maindiv_;this.graphDiv=document.createElement("div"),this.graphDiv.style.textAlign="left",this.graphDiv.style.position="relative",e.appendChild(this.graphDiv),this.canvas_=t.createCanvas(),this.canvas_.style.position="absolute",this.hidden_=this.createPlotKitCanvas_(this.canvas_),this.canvas_ctx_=t.getContext(this.canvas_),this.hidden_ctx_=t.getContext(this.hidden_),this.resizeElements_(),this.graphDiv.appendChild(this.hidden_),this.graphDiv.appendChild(this.canvas_),this.mouseEventElement_=this.createMouseEventElement_(),this.layout_=new DygraphLayout(this);var a=this;this.mouseMoveHandler_=function(t){a.mouseMove_(t)},this.mouseOutHandler_=function(e){var i=e.target||e.fromElement,r=e.relatedTarget||e.toElement;t.isNodeContainedBy(i,a.graphDiv)&&!t.isNodeContainedBy(r,a.graphDiv)&&a.mouseOut_(e)},this.addAndTrackEvent(window,"mouseout",this.mouseOutHandler_),this.addAndTrackEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),this.resizeHandler_||(this.resizeHandler_=function(t){a.resize()},this.addAndTrackEvent(window,"resize",this.resizeHandler_))},t.prototype.resizeElements_=function(){this.graphDiv.style.width=this.width_+"px",this.graphDiv.style.height=this.height_+"px";var e=t.getContextPixelRatio(this.canvas_ctx_);this.canvas_.width=this.width_*e,this.canvas_.height=this.height_*e,this.canvas_.style.width=this.width_+"px",this.canvas_.style.height=this.height_+"px",1!==e&&this.canvas_ctx_.scale(e,e);var a=t.getContextPixelRatio(this.hidden_ctx_);this.hidden_.width=this.width_*a,this.hidden_.height=this.height_*a,this.hidden_.style.width=this.width_+"px",this.hidden_.style.height=this.height_+"px",1!==a&&this.hidden_ctx_.scale(a,a)},t.prototype.destroy=function(){this.canvas_ctx_.restore(),this.hidden_ctx_.restore();for(var e=this.plugins_.length-1;e>=0;e--){var a=this.plugins_.pop();a.plugin.destroy&&a.plugin.destroy()}var i=function(t){for(;t.hasChildNodes();)i(t.firstChild),t.removeChild(t.firstChild)};this.removeTrackedEvents_(),t.removeEvent(window,"mouseout",this.mouseOutHandler_),t.removeEvent(this.mouseEventElement_,"mousemove",this.mouseMoveHandler_),t.removeEvent(window,"resize",this.resizeHandler_),this.resizeHandler_=null,i(this.maindiv_);var r=function(t){for(var e in t)"object"==typeof t[e]&&(t[e]=null)};r(this.layout_),r(this.plotter_),r(this)},t.prototype.createPlotKitCanvas_=function(e){var a=t.createCanvas();return a.style.position="absolute",a.style.top=e.style.top,a.style.left=e.style.left,a.width=this.width_,a.height=this.height_,a.style.width=this.width_+"px",a.style.height=this.height_+"px",a},t.prototype.createMouseEventElement_=function(){if(this.isUsingExcanvas_){var t=document.createElement("div");return t.style.position="absolute",t.style.backgroundColor="white",t.style.filter="alpha(opacity=0)",t.style.width=this.width_+"px",t.style.height=this.height_+"px",this.graphDiv.appendChild(t),t}return this.canvas_},t.prototype.setColors_=function(){var e=this.getLabels(),a=e.length-1;this.colors_=[],this.colorsMap_={};for(var i=this.getNumericOption("colorSaturation")||1,r=this.getNumericOption("colorValue")||.5,n=Math.ceil(a/2),o=this.getOption("colors"),s=this.visibility(),l=0;a>l;l++)if(s[l]){ +var h=e[l+1],p=this.attributes_.getForSeries("color",h);if(!p)if(o)p=o[l%o.length];else{var g=l%2?n+(l+1)/2:Math.ceil((l+1)/2),d=1*g/(1+a);p=t.hsvToRGB(d,i,r)}this.colors_.push(p),this.colorsMap_[h]=p}},t.prototype.getColors=function(){return this.colors_},t.prototype.getPropertiesForSeries=function(t){for(var e=-1,a=this.getLabels(),i=1;i=o;o++)s=t.zoomAnimationFunction(o,l),h[o-1]=[e[0]*(1-s)+s*a[0],e[1]*(1-s)+s*a[1]];if(null!==i&&null!==r)for(o=1;l>=o;o++){s=t.zoomAnimationFunction(o,l);for(var g=[],d=0;dl;l++){var h=o[l];if(t.isValidPoint(h,!0)){var p=Math.abs(h.canvasx-e);a>p&&(a=p,i=h.idx)}}return i},t.prototype.findClosestPoint=function(e,a){for(var i,r,n,o,s,l,h,p=1/0,g=this.layout_.points.length-1;g>=0;--g)for(var d=this.layout_.points[g],u=0;ui&&(p=i,s=o,l=g,h=o.idx));var c=this.layout_.setNames[l];return{row:h,seriesName:c,point:s}},t.prototype.findStackedPoint=function(e,a){for(var i,r,n=this.findClosestRow(e),o=0;o=h.length)){var p=h[l];if(t.isValidPoint(p)){var g=p.canvasy;if(e>p.canvasx&&l+10){var c=(e-p.canvasx)/u;g+=c*(d.canvasy-p.canvasy)}}}else if(e0){var y=h[l-1];if(t.isValidPoint(y)){var u=p.canvasx-y.canvasx;if(u>0){var c=(p.canvasx-e)/u;g+=c*(y.canvasy-p.canvasy)}}}(0===o||a>g)&&(i=p,r=o)}}}var _=this.layout_.setNames[r];return{row:n,seriesName:_,point:i}},t.prototype.mouseMove_=function(t){var e=this.layout_.points;if(void 0!==e&&null!==e){var a=this.eventToDomCoords(t),i=a[0],r=a[1],n=this.getOption("highlightSeriesOpts"),o=!1;if(n&&!this.isSeriesLocked()){var s;s=this.getBooleanOption("stackedGraph")?this.findStackedPoint(i,r):this.findClosestPoint(i,r),o=this.setSelection(s.row,s.seriesName)}else{var l=this.findClosestRow(i);o=this.setSelection(l)}var h=this.getFunctionOption("highlightCallback");h&&o&&h.call(this,t,this.lastx_,this.selPoints_,this.lastRow_,this.highlightSet_)}},t.prototype.getLeftBoundary_=function(t){if(this.boundaryIds_[t])return this.boundaryIds_[t][0];for(var e=0;ee?r:a-r;if(0>=n)return void(this.fadeLevel&&this.updateSelection_(1));var o=++this.animateId,s=this;t.repeatAndCleanup(function(t){s.animateId==o&&(s.fadeLevel+=e,0===s.fadeLevel?s.clearSelection():s.updateSelection_(s.fadeLevel/a))},n,i,function(){})},t.prototype.updateSelection_=function(e){this.cascadeEvents_("select",{selectedRow:this.lastRow_,selectedX:this.lastx_,selectedPoints:this.selPoints_});var a,i=this.canvas_ctx_;if(this.getOption("highlightSeriesOpts")){i.clearRect(0,0,this.width_,this.height_);var r=1-this.getNumericOption("highlightSeriesBackgroundAlpha");if(r){var n=!0;if(n){if(void 0===e)return void this.animateSelection_(1);r*=e}i.fillStyle="rgba(255,255,255,"+r+")",i.fillRect(0,0,this.width_,this.height_)}this.plotter_._renderLineChart(this.highlightSet_,i)}else if(this.previousVerticalX_>=0){var o=0,s=this.attr_("labels");for(a=1;ao&&(o=l)}var h=this.previousVerticalX_;i.clearRect(h-o-1,0,2*o+2,this.height_)}if(this.isUsingExcanvas_&&this.currentZoomRectArgs_&&t.prototype.drawZoomRect_.apply(this,this.currentZoomRectArgs_),this.selPoints_.length>0){var p=this.selPoints_[0].canvasx;for(i.save(),a=0;a=0){t!=this.lastRow_&&(i=!0),this.lastRow_=t;for(var r=0;r=0&&(i=!0),this.lastRow_=-1;return this.selPoints_.length?this.lastx_=this.selPoints_[0].xval:this.lastx_=-1,void 0!==e&&(this.highlightSet_!==e&&(i=!0),this.highlightSet_=e),void 0!==a&&(this.lockedSet_=a),i&&this.updateSelection_(void 0),i},t.prototype.mouseOut_=function(t){this.getFunctionOption("unhighlightCallback")&&this.getFunctionOption("unhighlightCallback").call(this,t),this.getBooleanOption("hideOverlayOnMouseOut")&&!this.lockedSet_&&this.clearSelection()},t.prototype.clearSelection=function(){return this.cascadeEvents_("deselect",{}),this.lockedSet_=!1,this.fadeLevel?void this.animateSelection_(-1):(this.canvas_ctx_.clearRect(0,0,this.width_,this.height_),this.fadeLevel=0,this.selPoints_=[],this.lastx_=-1,this.lastRow_=-1,void(this.highlightSet_=null))},t.prototype.getSelection=function(){if(!this.selPoints_||this.selPoints_.length<1)return-1;for(var t=0;t1&&(a=this.dataHandler_.rollingAverage(a,this.rollPeriod_,this.attributes_)),this.rolledSeries_.push(a)}this.drawGraph_();var i=new Date;this.drawingTimeMs_=i-t},t.PointType=void 0,t.stackPoints_=function(t,e,a,i){for(var r=null,n=null,o=null,s=-1,l=function(e){if(!(s>=e))for(var a=e;aa[1]&&(a[1]=u),u=1;i--)if(this.visibility()[i-1]){if(a){l=e[i];var c=a[0],y=a[1];for(n=null,o=null,r=0;r=c&&null===n&&(n=r),l[r][0]<=y&&(o=r);null===n&&(n=0);for(var _=n,v=!0;v&&_>0;)_--,v=null===l[_][1];null===o&&(o=l.length-1);var f=o;for(v=!0;v&&f0&&(this.setIndexByName_[n[0]]=0);for(var o=0,s=1;s0;){var a=this.readyFns_.pop();a(this)}},t.prototype.computeYAxes_=function(){var e,a,i,r,n;if(void 0!==this.axes_&&this.user_attrs_.hasOwnProperty("valueRange")===!1)for(e=[],i=0;ii;i++)this.axes_[i].valueWindow=e[i]}for(a=0;al;l++){var h=this.axes_[l],p=this.attributes_.getForAxis("logscale",l),g=this.attributes_.getForAxis("includeZero",l),d=this.attributes_.getForAxis("independentTicks",l);if(i=this.attributes_.seriesForAxis(l),e=!0,r=.1,null!==this.getNumericOption("yRangePad")&&(e=!1,r=this.getNumericOption("yRangePad")/this.plotter_.area.h),0===i.length)h.extremeRange=[0,1];else{for(var u,c,y=1/0,_=-(1/0),v=0;v0&&(y=0),0>_&&(_=0)),y==1/0&&(y=0),_==-(1/0)&&(_=1),a=_-y,0===a&&(0!==_?a=Math.abs(_):(_=1,a=1));var f,x;if(p)if(e)f=_+r*a,x=y;else{var m=Math.exp(Math.log(a)*r);f=_*m,x=y/m}else f=_+r*a,x=y-r*a,e&&!this.getBooleanOption("avoidMinZero")&&(0>x&&y>=0&&(x=0),f>0&&0>=_&&(f=0));h.extremeRange=[x,f]}if(h.valueWindow)h.computedValueRange=[h.valueWindow[0],h.valueWindow[1]];else if(h.valueRange){var D=o(h.valueRange[0])?h.extremeRange[0]:h.valueRange[0],w=o(h.valueRange[1])?h.extremeRange[1]:h.valueRange[1];if(!e)if(h.logscale){var m=Math.exp(Math.log(a)*r);D*=m,w/=m}else a=w-D,D-=a*r,w+=a*r;h.computedValueRange=[D,w]}else h.computedValueRange=h.extremeRange;if(d){h.independentTicks=d;var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker");h.ticks=b(h.computedValueRange[0],h.computedValueRange[1],this.plotter_.area.h,A,this),n||(n=h)}}if(void 0===n)throw'Configuration Error: At least one axis has to have the "independentTicks" option activated.';for(var l=0;s>l;l++){var h=this.axes_[l];if(!h.independentTicks){for(var A=this.optionsViewForAxis_("y"+(l?"2":"")),b=A("ticker"),T=n.ticks,E=n.computedValueRange[1]-n.computedValueRange[0],C=h.computedValueRange[1]-h.computedValueRange[0],L=[],P=0;P0&&"e"!=t[a-1]&&"E"!=t[a-1]||t.indexOf("/")>=0||isNaN(parseFloat(t))?e=!0:8==t.length&&t>"19700101"&&"20371231">t&&(e=!0),this.setXAxisOptions_(e)},t.prototype.setXAxisOptions_=function(e){e?(this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter):(this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter)},t.prototype.parseCSV_=function(e){var a,i,r=[],n=t.detectLineDelimiter(e),o=e.split(n||"\n"),s=this.getStringOption("delimiter");-1==o[0].indexOf(s)&&o[0].indexOf(" ")>=0&&(s=" ");var l=0;"labels"in this.user_attrs_||(l=1,this.attrs_.labels=o[0].split(s),this.attributes_.reparseSeries());for(var h,p=0,g=!1,d=this.attr_("labels").length,u=!1,c=l;c0&&v[0]0;)e=String.fromCharCode(65+(t-1)%26)+e.toLowerCase(),t=Math.floor((t-1)/26);return e},i=e.getNumberOfColumns(),r=e.getNumberOfRows(),n=e.getColumnType(0);if("date"==n||"datetime"==n)this.attrs_.xValueParser=t.dateParser,this.attrs_.axes.x.valueFormatter=t.dateValueFormatter,this.attrs_.axes.x.ticker=t.dateTicker,this.attrs_.axes.x.axisLabelFormatter=t.dateAxisLabelFormatter;else{if("number"!=n)return console.error("only 'date', 'datetime' and 'number' types are supported for column 1 of DataTable input (Got '"+n+"')"),null;this.attrs_.xValueParser=function(t){return parseFloat(t)},this.attrs_.axes.x.valueFormatter=function(t){return t},this.attrs_.axes.x.ticker=t.numericTicks,this.attrs_.axes.x.axisLabelFormatter=this.attrs_.axes.x.valueFormatter}var o,s,l=[],h={},p=!1;for(o=1;i>o;o++){var g=e.getColumnType(o);if("number"==g)l.push(o);else if("string"==g&&this.getBooleanOption("displayAnnotations")){var d=l[l.length-1];h.hasOwnProperty(d)?h[d].push(o):h[d]=[o],p=!0}else console.error("Only 'number' is supported as a dependent type with Gviz. 'string' is only supported if displayAnnotations is true")}var u=[e.getColumnLabel(0)];for(o=0;oo;o++){var v=[];if("undefined"!=typeof e.getValue(o,0)&&null!==e.getValue(o,0)){if(v.push("date"==n||"datetime"==n?e.getValue(o,0).getTime():e.getValue(o,0)),this.getBooleanOption("errorBars"))for(s=0;i-1>s;s++)v.push([e.getValue(o,1+2*s),e.getValue(o,2+2*s)]);else{for(s=0;s0&&v[0]0&&this.setAnnotations(_,!0),this.attributes_.reparseSeries()},t.prototype.cascadeDataDidUpdateEvent_=function(){this.cascadeEvents_("dataDidUpdate",{})},t.prototype.start_=function(){var e=this.file_;if("function"==typeof e&&(e=e()),t.isArrayLike(e))this.rawData_=this.parseArray_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("object"==typeof e&&"function"==typeof e.getColumnRange)this.parseDataTable_(e),this.cascadeDataDidUpdateEvent_(),this.predraw_();else if("string"==typeof e){var a=t.detectLineDelimiter(e);if(a)this.loadedEvent_(e);else{var i;i=window.XMLHttpRequest?new XMLHttpRequest:new ActiveXObject("Microsoft.XMLHTTP");var r=this;i.onreadystatechange=function(){4==i.readyState&&(200===i.status||0===i.status)&&r.loadedEvent_(i.responseText)},i.open("GET",e,!0),i.send(null)}}else console.error("Unknown data format: "+typeof e)},t.prototype.updateOptions=function(e,a){"undefined"==typeof a&&(a=!1);var i=e.file,r=t.mapLegacyOptions_(e);"rollPeriod"in r&&(this.rollPeriod_=r.rollPeriod),"dateWindow"in r&&(this.dateWindow_=r.dateWindow,"isZoomedIgnoreProgrammaticZoom"in r||(this.zoomed_x_=null!==r.dateWindow)),"valueRange"in r&&!("isZoomedIgnoreProgrammaticZoom"in r)&&(this.zoomed_y_=null!==r.valueRange);var n=t.isPixelChangingOptionList(this.attr_("labels"),r);t.updateDeep(this.user_attrs_,r),this.attributes_.reparseSeries(),i?(this.cascadeEvents_("dataWillUpdate",{}),this.file_=i,a||this.start_()):a||(n?this.predraw_():this.renderGraph_(!1))},t.mapLegacyOptions_=function(t){var e={};for(var a in t)t.hasOwnProperty(a)&&"file"!=a&&t.hasOwnProperty(a)&&(e[a]=t[a]);var i=function(t,a,i){e.axes||(e.axes={}),e.axes[t]||(e.axes[t]={}),e.axes[t][a]=i},r=function(a,r,n){"undefined"!=typeof t[a]&&(console.warn("Option "+a+" is deprecated. Use the "+n+" option for the "+r+" axis instead. (e.g. { axes : { "+r+" : { "+n+" : ... } } } (see http://dygraphs.com/per-axis.html for more information."),i(r,n,t[a]),delete e[a])};return r("xValueFormatter","x","valueFormatter"),r("pixelsPerXLabel","x","pixelsPerLabel"),r("xAxisLabelFormatter","x","axisLabelFormatter"),r("xTicker","x","ticker"),r("yValueFormatter","y","valueFormatter"),r("pixelsPerYLabel","y","pixelsPerLabel"),r("yAxisLabelFormatter","y","axisLabelFormatter"),r("yTicker","y","ticker"),r("drawXGrid","x","drawGrid"),r("drawXAxis","x","drawAxis"),r("drawYGrid","y","drawGrid"),r("drawYAxis","y","drawAxis"),r("xAxisLabelWidth","x","axisLabelWidth"),r("yAxisLabelWidth","y","axisLabelWidth"),e},t.prototype.resize=function(t,e){if(!this.resize_lock){this.resize_lock=!0,null===t!=(null===e)&&(console.warn("Dygraph.resize() should be called with zero parameters or two non-NULL parameters. Pretending it was zero."),t=e=null);var a=this.width_,i=this.height_;t?(this.maindiv_.style.width=t+"px",this.maindiv_.style.height=e+"px",this.width_=t,this.height_=e):(this.width_=this.maindiv_.clientWidth,this.height_=this.maindiv_.clientHeight),(a!=this.width_||i!=this.height_)&&(this.resizeElements_(),this.predraw_()),this.resize_lock=!1}},t.prototype.adjustRoll=function(t){this.rollPeriod_=t,this.predraw_()},t.prototype.visibility=function(){for(this.getOption("visibility")||(this.attrs_.visibility=[]);this.getOption("visibility").lengtht||t>=a.length?console.warn("invalid series number in setVisibility: "+t):(a[t]=e,this.predraw_())},t.prototype.size=function(){return{width:this.width_,height:this.height_}},t.prototype.setAnnotations=function(e,a){return t.addAnnotationRule(),this.annotations_=e,this.layout_?(this.layout_.setAnnotations(this.annotations_),void(a||this.predraw_())):void console.warn("Tried to setAnnotations before dygraph was ready. Try setting them in a ready() block. See dygraphs.com/tests/annotation.html")},t.prototype.annotations=function(){return this.annotations_},t.prototype.getLabels=function(){var t=this.attr_("labels");return t?t.slice():null},t.prototype.indexFromSetName=function(t){return this.setIndexByName_[t]},t.prototype.ready=function(t){this.is_initial_draw_?this.readyFns_.push(t):t.call(this,this)},t.addAnnotationRule=function(){if(!t.addedAnnotationCSS){var e="border: 1px solid black; background-color: white; text-align: center;",a=document.createElement("style");a.type="text/css",document.getElementsByTagName("head")[0].appendChild(a);for(var i=0;it?"0"+t:""+t},Dygraph.DateAccessorsLocal={getFullYear:function(t){return t.getFullYear()},getMonth:function(t){return t.getMonth()},getDate:function(t){return t.getDate()},getHours:function(t){return t.getHours()},getMinutes:function(t){return t.getMinutes()},getSeconds:function(t){return t.getSeconds()},getMilliseconds:function(t){return t.getMilliseconds()},getDay:function(t){return t.getDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(t,e,a,i,r,n,o)}},Dygraph.DateAccessorsUTC={getFullYear:function(t){return t.getUTCFullYear()},getMonth:function(t){return t.getUTCMonth()},getDate:function(t){return t.getUTCDate()},getHours:function(t){return t.getUTCHours()},getMinutes:function(t){return t.getUTCMinutes()},getSeconds:function(t){return t.getUTCSeconds()},getMilliseconds:function(t){return t.getUTCMilliseconds()},getDay:function(t){return t.getUTCDay()},makeDate:function(t,e,a,i,r,n,o){return new Date(Date.UTC(t,e,a,i,r,n,o))}},Dygraph.hmsString_=function(t,e,a){var i=Dygraph.zeropad,r=i(t)+":"+i(e);return a&&(r+=":"+i(a)),r},Dygraph.dateString_=function(t,e){var a=Dygraph.zeropad,i=e?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,r=new Date(t),n=i.getFullYear(r),o=i.getMonth(r),s=i.getDate(r),l=i.getHours(r),h=i.getMinutes(r),p=i.getSeconds(r),g=""+n,d=a(o+1),u=a(s),c=3600*l+60*h+p,y=g+"/"+d+"/"+u;return c&&(y+=" "+Dygraph.hmsString_(l,h,p)),y},Dygraph.round_=function(t,e){var a=Math.pow(10,e);return Math.round(t*a)/a},Dygraph.binarySearch=function(t,e,a,i,r){if((null===i||void 0===i||null===r||void 0===r)&&(i=0,r=e.length-1),i>r)return-1;(null===a||void 0===a)&&(a=0);var n,o=function(t){return t>=0&&tt?a>0&&(n=s-1,o(n)&&e[n]l?0>a&&(n=s+1,o(n)&&e[n]>t)?s:Dygraph.binarySearch(t,e,a,s+1,r):-1},Dygraph.dateParser=function(t){var e,a;if((-1==t.search("-")||-1!=t.search("T")||-1!=t.search("Z"))&&(a=Dygraph.dateStrToMillis(t),a&&!isNaN(a)))return a;if(-1!=t.search("-")){for(e=t.replace("-","/","g");-1!=e.search("-");)e=e.replace("-","/");a=Dygraph.dateStrToMillis(e)}else 8==t.length?(e=t.substr(0,4)+"/"+t.substr(4,2)+"/"+t.substr(6,2),a=Dygraph.dateStrToMillis(e)):a=Dygraph.dateStrToMillis(t);return(!a||isNaN(a))&&console.error("Couldn't parse "+t+" as a date"),a},Dygraph.dateStrToMillis=function(t){return new Date(t).getTime()},Dygraph.update=function(t,e){if("undefined"!=typeof e&&null!==e)for(var a in e)e.hasOwnProperty(a)&&(t[a]=e[a]);return t},Dygraph.updateDeep=function(t,e){function a(t){return"object"==typeof Node?t instanceof Node:"object"==typeof t&&"number"==typeof t.nodeType&&"string"==typeof t.nodeName}if("undefined"!=typeof e&&null!==e)for(var i in e)e.hasOwnProperty(i)&&(null===e[i]?t[i]=null:Dygraph.isArrayLike(e[i])?t[i]=e[i].slice():a(e[i])?t[i]=e[i]:"object"==typeof e[i]?(("object"!=typeof t[i]||null===t[i])&&(t[i]={}),Dygraph.updateDeep(t[i],e[i])):t[i]=e[i]);return t},Dygraph.isArrayLike=function(t){var e=typeof t;return"object"!=e&&("function"!=e||"function"!=typeof t.item)||null===t||"number"!=typeof t.length||3===t.nodeType?!1:!0},Dygraph.isDateLike=function(t){return"object"!=typeof t||null===t||"function"!=typeof t.getTime?!1:!0},Dygraph.clone=function(t){for(var e=[],a=0;a=e||Dygraph.requestAnimFrame.call(window,function(){var e=(new Date).getTime(),h=e-o;r=n,n=Math.floor(h/a);var p=n-r,g=n+p>s;g||n>=s?(t(s),i()):(0!==p&&t(n),l())})}()};var e={annotationClickHandler:!0,annotationDblClickHandler:!0,annotationMouseOutHandler:!0,annotationMouseOverHandler:!0,axisLabelColor:!0,axisLineColor:!0,axisLineWidth:!0,clickCallback:!0,drawCallback:!0,drawHighlightPointCallback:!0,drawPoints:!0,drawPointCallback:!0,drawXGrid:!0,drawYGrid:!0,fillAlpha:!0,gridLineColor:!0,gridLineWidth:!0,hideOverlayOnMouseOut:!0,highlightCallback:!0,highlightCircleSize:!0,interactionModel:!0,isZoomedIgnoreProgrammaticZoom:!0,labelsDiv:!0,labelsDivStyles:!0,labelsDivWidth:!0,labelsKMB:!0,labelsKMG2:!0,labelsSeparateLines:!0,labelsShowZeroValues:!0,legend:!0,panEdgeFraction:!0,pixelsPerYLabel:!0,pointClickCallback:!0,pointSize:!0,rangeSelectorPlotFillColor:!0,rangeSelectorPlotStrokeColor:!0,showLabelsOnHighlight:!0,showRoller:!0,strokeWidth:!0,underlayCallback:!0,unhighlightCallback:!0,zoomCallback:!0};Dygraph.isPixelChangingOptionList=function(t,a){var i={};if(t)for(var r=1;re?1/Math.pow(t,-e):Math.pow(t,e)};var a=/^rgba?\((\d{1,3}),\s*(\d{1,3}),\s*(\d{1,3})(?:,\s*([01](?:\.\d+)?))?\)$/;Dygraph.toRGB_=function(e){var a=t(e);if(a)return a;var i=document.createElement("div");i.style.backgroundColor=e,i.style.visibility="hidden",document.body.appendChild(i);var r;return r=window.getComputedStyle?window.getComputedStyle(i,null).backgroundColor:i.currentStyle.backgroundColor,document.body.removeChild(i),t(r)},Dygraph.isCanvasSupported=function(t){var e;try{e=t||document.createElement("canvas"),e.getContext("2d")}catch(a){var i=navigator.appVersion.match(/MSIE (\d\.\d)/),r=-1!=navigator.userAgent.toLowerCase().indexOf("opera");return!i||i[1]<6||r?!1:!0}return!0},Dygraph.parseFloat_=function(t,e,a){var i=parseFloat(t);if(!isNaN(i))return i;if(/^ *$/.test(t))return null;if(/^ *nan *$/i.test(t))return 0/0;var r="Unable to parse '"+t+"' as a number";return void 0!==a&&void 0!==e&&(r+=" on line "+(1+(e||0))+" ('"+a+"') of CSV."),console.error(r),null}}(),function(){"use strict";Dygraph.GVizChart=function(t){this.container=t},Dygraph.GVizChart.prototype.draw=function(t,e){this.container.innerHTML="","undefined"!=typeof this.date_graph&&this.date_graph.destroy(),this.date_graph=new Dygraph(this.container,t,e)},Dygraph.GVizChart.prototype.setSelection=function(t){var e=!1;t.length&&(e=t[0].row),this.date_graph.setSelection(e)},Dygraph.GVizChart.prototype.getSelection=function(){var t=[],e=this.date_graph.getSelection();if(0>e)return t;for(var a=this.date_graph.layout_.points,i=0;ii&&2>r&&void 0!==e.lastx_&&-1!=e.lastx_&&Dygraph.Interaction.treatMouseOpAsClick(e,t,a),a.regionWidth=i,a.regionHeight=r},Dygraph.Interaction.startPan=function(t,e,a){var i,r;a.isPanning=!0;var n=e.xAxisRange();if(e.getOptionForAxis("logscale","x")?(a.initialLeftmostDate=Dygraph.log10(n[0]),a.dateRange=Dygraph.log10(n[1])-Dygraph.log10(n[0])):(a.initialLeftmostDate=n[0],a.dateRange=n[1]-n[0]),a.xUnitsPerPixel=a.dateRange/(e.plotter_.area.w-1),e.getNumericOption("panEdgeFraction")){var o=e.width_*e.getNumericOption("panEdgeFraction"),s=e.xAxisExtremes(),l=e.toDomXCoord(s[0])-o,h=e.toDomXCoord(s[1])+o,p=e.toDataXCoord(l),g=e.toDataXCoord(h);a.boundedDates=[p,g];var d=[],u=e.height_*e.getNumericOption("panEdgeFraction");for(i=0;ia.boundedDates[1]&&(i-=r-a.boundedDates[1],r=i+a.dateRange),e.getOptionForAxis("logscale","x")?e.dateWindow_=[Math.pow(Dygraph.LOG_SCALE,i),Math.pow(Dygraph.LOG_SCALE,r)]:e.dateWindow_=[i,r],a.is2DPan)for(var n=a.dragEndY-a.dragStartY,o=0;oi?Dygraph.VERTICAL:Dygraph.HORIZONTAL,e.drawZoomRect_(a.dragDirection,a.dragStartX,a.dragEndX,a.dragStartY,a.dragEndY,a.prevDragDirection,a.prevEndX,a.prevEndY),a.prevEndX=a.dragEndX,a.prevEndY=a.dragEndY,a.prevDragDirection=a.dragDirection},Dygraph.Interaction.treatMouseOpAsClick=function(t,e,a){for(var i=t.getFunctionOption("clickCallback"),r=t.getFunctionOption("pointClickCallback"),n=null,o=-1,s=Number.MAX_VALUE,l=0;lp)&&(s=p,o=l)}var g=t.getNumericOption("highlightCircleSize")+2;if(g*g>=s&&(n=t.selPoints_[o]),n){var d={cancelable:!0,point:n,canvasx:a.dragEndX,canvasy:a.dragEndY},u=t.cascadeEvents_("pointClick",d);if(u)return;r&&r.call(t,e,n)}var d={cancelable:!0,xval:t.lastx_,pts:t.selPoints_,canvasx:a.dragEndX,canvasy:a.dragEndY};t.cascadeEvents_("click",d)||i&&i.call(t,e,t.lastx_,t.selPoints_)},Dygraph.Interaction.endZoom=function(t,e,a){e.clearZoomRect_(),a.isZooming=!1,Dygraph.Interaction.maybeTreatMouseOpAsClick(t,e,a);var i=e.getArea();if(a.regionWidth>=10&&a.dragDirection==Dygraph.HORIZONTAL){var r=Math.min(a.dragStartX,a.dragEndX),n=Math.max(a.dragStartX,a.dragEndX);r=Math.max(r,i.x),n=Math.min(n,i.x+i.w),n>r&&e.doZoomX_(r,n),a.cancelNextDblclick=!0}else if(a.regionHeight>=10&&a.dragDirection==Dygraph.VERTICAL){var o=Math.min(a.dragStartY,a.dragEndY),s=Math.max(a.dragStartY,a.dragEndY);o=Math.max(o,i.y),s=Math.min(s,i.y+i.h),s>o&&e.doZoomY_(o,s),a.cancelNextDblclick=!0}a.dragStartX=null,a.dragStartY=null},Dygraph.Interaction.startTouch=function(t,e,a){t.preventDefault(),t.touches.length>1&&(a.startTimeForDoubleTapMs=null);for(var i=[],r=0;r=2){a.initialPinchCenter={pageX:.5*(i[0].pageX+i[1].pageX),pageY:.5*(i[0].pageY+i[1].pageY),dataX:.5*(i[0].dataX+i[1].dataX),dataY:.5*(i[0].dataY+i[1].dataY)};var o=180/Math.PI*Math.atan2(a.initialPinchCenter.pageY-i[0].pageY,i[0].pageX-a.initialPinchCenter.pageX);o=Math.abs(o),o>90&&(o=90-o),a.touchDirections={x:67.5>o,y:o>22.5}}a.initialRange={x:e.xAxisRange(),y:e.yAxisRange()}},Dygraph.Interaction.moveTouch=function(t,e,a){a.startTimeForDoubleTapMs=null;var i,r=[];for(i=0;i=2){var c=s[1].pageX-l.pageX;d=(r[1].pageX-o.pageX)/c;var y=s[1].pageY-l.pageY;u=(r[1].pageY-o.pageY)/y}d=Math.min(8,Math.max(.125,d)),u=Math.min(8,Math.max(.125,u));var _=!1;if(a.touchDirections.x&&(e.dateWindow_=[l.dataX-h.dataX+(a.initialRange.x[0]-l.dataX)/d,l.dataX-h.dataX+(a.initialRange.x[1]-l.dataX)/d],_=!0),a.touchDirections.y)for(i=0;1>i;i++){var v=e.axes_[i],f=e.attributes_.getForAxis("logscale",i);f||(v.valueWindow=[l.dataY-h.dataY+(a.initialRange.y[0]-l.dataY)/u,l.dataY-h.dataY+(a.initialRange.y[1]-l.dataY)/u],_=!0)}if(e.drawGraph_(!1),_&&r.length>1&&e.getFunctionOption("zoomCallback")){var x=e.xAxisRange();e.getFunctionOption("zoomCallback").call(e,x[0],x[1],e.yAxisRanges())}},Dygraph.Interaction.endTouch=function(t,e,a){if(0!==t.touches.length)Dygraph.Interaction.startTouch(t,e,a);else if(1==t.changedTouches.length){var i=(new Date).getTime(),r=t.changedTouches[0];a.startTimeForDoubleTapMs&&i-a.startTimeForDoubleTapMs<500&&a.doubleTapX&&Math.abs(a.doubleTapX-r.screenX)<50&&a.doubleTapY&&Math.abs(a.doubleTapY-r.screenY)<50?e.resetZoom():(a.startTimeForDoubleTapMs=i,a.doubleTapX=r.screenX,a.doubleTapY=r.screenY)}};var e=function(t,e,a){return e>t?e-t:t>a?t-a:0},a=function(t,a){var i=Dygraph.findPos(a.canvas_),r={left:i.x,right:i.x+a.canvas_.offsetWidth,top:i.y,bottom:i.y+a.canvas_.offsetHeight},n={x:Dygraph.pageX(t),y:Dygraph.pageY(t)},o=e(n.x,r.left,r.right),s=e(n.y,r.top,r.bottom);return Math.max(o,s)};Dygraph.Interaction.defaultModel={mousedown:function(e,i,r){if(!e.button||2!=e.button){r.initializeMouseDown(e,i,r),e.altKey||e.shiftKey?Dygraph.startPan(e,i,r):Dygraph.startZoom(e,i,r);var n=function(e){if(r.isZooming){var n=a(e,i);t>n?Dygraph.moveZoom(e,i,r):null!==r.dragEndX&&(r.dragEndX=null,r.dragEndY=null,i.clearZoomRect_())}else r.isPanning&&Dygraph.movePan(e,i,r)},o=function(t){r.isZooming?null!==r.dragEndX?Dygraph.endZoom(t,i,r):Dygraph.Interaction.maybeTreatMouseOpAsClick(t,i,r):r.isPanning&&Dygraph.endPan(t,i,r),Dygraph.removeEvent(document,"mousemove",n),Dygraph.removeEvent(document,"mouseup",o),r.destroy()};i.addAndTrackEvent(document,"mousemove",n),i.addAndTrackEvent(document,"mouseup",o)}},willDestroyContextMyself:!0,touchstart:function(t,e,a){Dygraph.Interaction.startTouch(t,e,a)},touchmove:function(t,e,a){Dygraph.Interaction.moveTouch(t,e,a)},touchend:function(t,e,a){Dygraph.Interaction.endTouch(t,e,a)},dblclick:function(t,e,a){if(a.cancelNextDblclick)return void(a.cancelNextDblclick=!1);var i={canvasx:a.dragEndX,canvasy:a.dragEndY};e.cascadeEvents_("dblclick",i)||t.altKey||t.shiftKey||e.resetZoom()}},Dygraph.DEFAULT_ATTRS.interactionModel=Dygraph.Interaction.defaultModel,Dygraph.defaultInteractionModel=Dygraph.Interaction.defaultModel,Dygraph.endZoom=Dygraph.Interaction.endZoom,Dygraph.moveZoom=Dygraph.Interaction.moveZoom,Dygraph.startZoom=Dygraph.Interaction.startZoom,Dygraph.endPan=Dygraph.Interaction.endPan,Dygraph.movePan=Dygraph.Interaction.movePan,Dygraph.startPan=Dygraph.Interaction.startPan,Dygraph.Interaction.nonInteractiveModel_={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a)},mouseup:Dygraph.Interaction.maybeTreatMouseOpAsClick},Dygraph.Interaction.dragIsPanInteractionModel={mousedown:function(t,e,a){a.initializeMouseDown(t,e,a),Dygraph.startPan(t,e,a)},mousemove:function(t,e,a){a.isPanning&&Dygraph.movePan(t,e,a)},mouseup:function(t,e,a){a.isPanning&&Dygraph.endPan(t,e,a)}}}(),function(){"use strict";Dygraph.TickList=void 0,Dygraph.Ticker=void 0,Dygraph.numericLinearTicks=function(t,e,a,i,r,n){var o=function(t){return"logscale"===t?!1:i(t)};return Dygraph.numericTicks(t,e,a,o,r,n)},Dygraph.numericTicks=function(t,e,a,i,r,n){var o,s,l,h,p=i("pixelsPerLabel"),g=[];if(n)for(o=0;o=h/4){for(var y=u;y>=d;y--){var _=Dygraph.PREFERRED_LOG_TICK_VALUES[y],v=Math.log(_/t)/Math.log(e/t)*a,f={v:_};null===c?c={tickValue:_,pixel_coord:v}:Math.abs(v-c.pixel_coord)>=p?c={tickValue:_,pixel_coord:v}:f.label="",g.push(f)}g.reverse()}}if(0===g.length){var x,m,D=i("labelsKMG2");D?(x=[1,2,4,8,16,32,64,128,256],m=16):(x=[1,2,5,10,20,50,100],m=10);var w,A,b,T,E=Math.ceil(a/p),C=Math.abs(e-t)/E,L=Math.floor(Math.log(C)/Math.log(m)),P=Math.pow(m,L);for(s=0;sp));s++);for(A>b&&(w*=-1),o=0;h>=o;o++)l=A+o*w,g.push({v:l})}}var S=i("axisLabelFormatter");for(o=0;o=0?Dygraph.getDateAxis(t,e,o,i,r):[]},Dygraph.SECONDLY=0,Dygraph.TWO_SECONDLY=1,Dygraph.FIVE_SECONDLY=2,Dygraph.TEN_SECONDLY=3,Dygraph.THIRTY_SECONDLY=4,Dygraph.MINUTELY=5,Dygraph.TWO_MINUTELY=6,Dygraph.FIVE_MINUTELY=7,Dygraph.TEN_MINUTELY=8,Dygraph.THIRTY_MINUTELY=9,Dygraph.HOURLY=10,Dygraph.TWO_HOURLY=11,Dygraph.SIX_HOURLY=12,Dygraph.DAILY=13,Dygraph.TWO_DAILY=14,Dygraph.WEEKLY=15,Dygraph.MONTHLY=16,Dygraph.QUARTERLY=17,Dygraph.BIANNUAL=18,Dygraph.ANNUAL=19,Dygraph.DECADAL=20,Dygraph.CENTENNIAL=21,Dygraph.NUM_GRANULARITIES=22,Dygraph.DATEFIELD_Y=0,Dygraph.DATEFIELD_M=1,Dygraph.DATEFIELD_D=2,Dygraph.DATEFIELD_HH=3,Dygraph.DATEFIELD_MM=4,Dygraph.DATEFIELD_SS=5,Dygraph.DATEFIELD_MS=6,Dygraph.NUM_DATEFIELDS=7,Dygraph.TICK_PLACEMENT=[],Dygraph.TICK_PLACEMENT[Dygraph.SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:1,spacing:1e3},Dygraph.TICK_PLACEMENT[Dygraph.TWO_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:2,spacing:2e3},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:5,spacing:5e3},Dygraph.TICK_PLACEMENT[Dygraph.TEN_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:10,spacing:1e4},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_SECONDLY]={datefield:Dygraph.DATEFIELD_SS,step:30,spacing:3e4},Dygraph.TICK_PLACEMENT[Dygraph.MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:1,spacing:6e4},Dygraph.TICK_PLACEMENT[Dygraph.TWO_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:2,spacing:12e4},Dygraph.TICK_PLACEMENT[Dygraph.FIVE_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:5,spacing:3e5},Dygraph.TICK_PLACEMENT[Dygraph.TEN_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:10,spacing:6e5},Dygraph.TICK_PLACEMENT[Dygraph.THIRTY_MINUTELY]={datefield:Dygraph.DATEFIELD_MM,step:30,spacing:18e5},Dygraph.TICK_PLACEMENT[Dygraph.HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:1,spacing:36e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:2,spacing:72e5},Dygraph.TICK_PLACEMENT[Dygraph.SIX_HOURLY]={datefield:Dygraph.DATEFIELD_HH,step:6,spacing:216e5},Dygraph.TICK_PLACEMENT[Dygraph.DAILY]={datefield:Dygraph.DATEFIELD_D,step:1,spacing:864e5},Dygraph.TICK_PLACEMENT[Dygraph.TWO_DAILY]={datefield:Dygraph.DATEFIELD_D,step:2,spacing:1728e5},Dygraph.TICK_PLACEMENT[Dygraph.WEEKLY]={datefield:Dygraph.DATEFIELD_D,step:7,spacing:6048e5},Dygraph.TICK_PLACEMENT[Dygraph.MONTHLY]={datefield:Dygraph.DATEFIELD_M,step:1,spacing:2629817280},Dygraph.TICK_PLACEMENT[Dygraph.QUARTERLY]={datefield:Dygraph.DATEFIELD_M,step:3,spacing:216e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.BIANNUAL]={datefield:Dygraph.DATEFIELD_M,step:6,spacing:432e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.ANNUAL]={datefield:Dygraph.DATEFIELD_Y,step:1,spacing:864e5*365.2524},Dygraph.TICK_PLACEMENT[Dygraph.DECADAL]={datefield:Dygraph.DATEFIELD_Y,step:10,spacing:315578073600},Dygraph.TICK_PLACEMENT[Dygraph.CENTENNIAL]={datefield:Dygraph.DATEFIELD_Y,step:100,spacing:3155780736e3},Dygraph.PREFERRED_LOG_TICK_VALUES=function(){for(var t=[],e=-39;39>=e;e++)for(var a=Math.pow(10,e),i=1;9>=i;i++){var r=a*i;t.push(r)}return t}(),Dygraph.pickDateTickGranularity=function(t,e,a,i){for(var r=i("pixelsPerLabel"),n=0;n=r)return n}return-1},Dygraph.numDateTicks=function(t,e,a){var i=Dygraph.TICK_PLACEMENT[a].spacing;return Math.round(1*(e-t)/i)},Dygraph.getDateAxis=function(t,e,a,i,r){var n=i("axisLabelFormatter"),o=i("labelsUTC"),s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal,l=Dygraph.TICK_PLACEMENT[a].datefield,h=Dygraph.TICK_PLACEMENT[a].step,p=Dygraph.TICK_PLACEMENT[a].spacing,g=new Date(t),d=[];d[Dygraph.DATEFIELD_Y]=s.getFullYear(g),d[Dygraph.DATEFIELD_M]=s.getMonth(g),d[Dygraph.DATEFIELD_D]=s.getDate(g),d[Dygraph.DATEFIELD_HH]=s.getHours(g),d[Dygraph.DATEFIELD_MM]=s.getMinutes(g),d[Dygraph.DATEFIELD_SS]=s.getSeconds(g),d[Dygraph.DATEFIELD_MS]=s.getMilliseconds(g);var u=d[l]%h;a==Dygraph.WEEKLY&&(u=s.getDay(g)),d[l]-=u;for(var c=l+1;cv&&(v+=p,_=new Date(v));e>=v;)y.push({v:v,label:n.call(r,_,a,i,r)}),v+=p,_=new Date(v);else for(t>v&&(d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime());e>=v;)(a>=Dygraph.DAILY||s.getHours(_)%h===0)&&y.push({v:v,label:n.call(r,_,a,i,r)}),d[l]+=h,_=s.makeDate.apply(null,d),v=_.getTime();return y},Dygraph&&Dygraph.DEFAULT_ATTRS&&Dygraph.DEFAULT_ATTRS.axes&&Dygraph.DEFAULT_ATTRS.axes.x&&Dygraph.DEFAULT_ATTRS.axes.y&&Dygraph.DEFAULT_ATTRS.axes.y2&&(Dygraph.DEFAULT_ATTRS.axes.x.ticker=Dygraph.dateTicker,Dygraph.DEFAULT_ATTRS.axes.y.ticker=Dygraph.numericTicks,Dygraph.DEFAULT_ATTRS.axes.y2.ticker=Dygraph.numericTicks)}(),Dygraph.Plugins={},Dygraph.Plugins.Annotations=function(){"use strict";var t=function(){this.annotations_=[]};return t.prototype.toString=function(){return"Annotations Plugin"},t.prototype.activate=function(t){return{clearChart:this.clearChart,didDrawChart:this.didDrawChart}},t.prototype.detachLabels=function(){for(var t=0;to.x+o.w||h.canvasyo.y+o.h)){var p=h.annotation,g=6;p.hasOwnProperty("tickHeight")&&(g=p.tickHeight);var d=document.createElement("div");for(var u in r)r.hasOwnProperty(u)&&(d.style[u]=r[u]);p.hasOwnProperty("icon")||(d.className="dygraphDefaultAnnotation"),p.hasOwnProperty("cssClass")&&(d.className+=" "+p.cssClass);var c=p.hasOwnProperty("width")?p.width:16,y=p.hasOwnProperty("height")?p.height:16;if(p.hasOwnProperty("icon")){var _=document.createElement("img");_.src=p.icon,_.width=c,_.height=y,d.appendChild(_)}else h.annotation.hasOwnProperty("shortText")&&d.appendChild(document.createTextNode(h.annotation.shortText));var v=h.canvasx-c/2;d.style.left=v+"px";var f=0;if(p.attachAtBottom){var x=o.y+o.h-y-g;s[v]?x-=s[v]:s[v]=0,s[v]+=g+y,f=x}else f=h.canvasy-y-g;d.style.top=f+"px",d.style.width=c+"px",d.style.height=y+"px",d.title=h.annotation.text,d.style.color=e.colorsMap_[h.name],d.style.borderColor=e.colorsMap_[h.name],p.div=d,e.addAndTrackEvent(d,"click",n("clickHandler","annotationClickHandler",h,this)),e.addAndTrackEvent(d,"mouseover",n("mouseOverHandler","annotationMouseOverHandler",h,this)),e.addAndTrackEvent(d,"mouseout",n("mouseOutHandler","annotationMouseOutHandler",h,this)),e.addAndTrackEvent(d,"dblclick",n("dblClickHandler","annotationDblClickHandler",h,this)),i.appendChild(d),this.annotations_.push(d);var m=t.drawingContext;if(m.save(),m.strokeStyle=e.colorsMap_[h.name],m.beginPath(),p.attachAtBottom){var x=f+y;m.moveTo(h.canvasx,x),m.lineTo(h.canvasx,x+g)}else m.moveTo(h.canvasx,h.canvasy),m.lineTo(h.canvasx,h.canvasy-2-g);m.closePath(),m.stroke(),m.restore()}}},t.prototype.destroy=function(){this.detachLabels()},t}(),Dygraph.Plugins.Axes=function(){"use strict";var t=function(){this.xlabels_=[],this.ylabels_=[]};return t.prototype.toString=function(){return"Axes Plugin"},t.prototype.activate=function(t){return{layout:this.layout,clearChart:this.clearChart,willDrawChart:this.willDrawChart}},t.prototype.layout=function(t){var e=t.dygraph;if(e.getOptionForAxis("drawAxis","y")){var a=e.getOptionForAxis("axisLabelWidth","y")+2*e.getOptionForAxis("axisTickSize","y");t.reserveSpaceLeft(a)}if(e.getOptionForAxis("drawAxis","x")){var i;i=e.getOption("xAxisHeight")?e.getOption("xAxisHeight"):e.getOptionForAxis("axisLabelFontSize","x")+2*e.getOptionForAxis("axisTickSize","x"),t.reserveSpaceBottom(i)}if(2==e.numAxes()){if(e.getOptionForAxis("drawAxis","y2")){var a=e.getOptionForAxis("axisLabelWidth","y2")+2*e.getOptionForAxis("axisTickSize","y2");t.reserveSpaceRight(a)}}else e.numAxes()>2&&e.error("Only two y-axes are supported at this time. (Trying to use "+e.numAxes()+")")},t.prototype.detachLabels=function(){function t(t){for(var e=0;e0){var x=i.numAxes(),m=[f("y"),f("y2")];for(l=0;l<_.yticks.length;l++){if(s=_.yticks[l],"function"==typeof s)return;n=v.x;var D=1,w="y1",A=m[0];1==s[0]&&(n=v.x+v.w,D=-1,w="y2",A=m[1]);var b=A("axisLabelFontSize");o=v.y+s[1]*v.h,r=y(s[2],"y",2==x?w:null);var T=o-b/2;0>T&&(T=0),T+b+3>d?r.style.bottom="0":r.style.top=T+"px",0===s[0]?(r.style.left=v.x-A("axisLabelWidth")-A("axisTickSize")+"px",r.style.textAlign="right"):1==s[0]&&(r.style.left=v.x+v.w+A("axisTickSize")+"px",r.style.textAlign="left"),r.style.width=A("axisLabelWidth")+"px",p.appendChild(r),this.ylabels_.push(r)}var E=this.ylabels_[0],b=i.getOptionForAxis("axisLabelFontSize","y"),C=parseInt(E.style.top,10)+b;C>d-b&&(E.style.top=parseInt(E.style.top,10)-b/2+"px")}var L;if(i.getOption("drawAxesAtZero")){var P=i.toPercentXCoord(0);(P>1||0>P||isNaN(P))&&(P=0),L=e(v.x+P*v.w)}else L=e(v.x);h.strokeStyle=i.getOptionForAxis("axisLineColor","y"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y"),h.beginPath(),h.moveTo(L,a(v.y)),h.lineTo(L,a(v.y+v.h)),h.closePath(),h.stroke(),2==i.numAxes()&&(h.strokeStyle=i.getOptionForAxis("axisLineColor","y2"),h.lineWidth=i.getOptionForAxis("axisLineWidth","y2"),h.beginPath(),h.moveTo(a(v.x+v.w),a(v.y)),h.lineTo(a(v.x+v.w),a(v.y+v.h)),h.closePath(),h.stroke())}if(i.getOptionForAxis("drawAxis","x")){if(_.xticks){var A=f("x");for(l=0;l<_.xticks.length;l++){s=_.xticks[l],n=v.x+s[0]*v.w,o=v.y+v.h,r=y(s[1],"x"),r.style.textAlign="center",r.style.top=o+A("axisTickSize")+"px";var S=n-A("axisLabelWidth")/2;S+A("axisLabelWidth")>g&&(S=g-A("axisLabelWidth"),r.style.textAlign="right"),0>S&&(S=0,r.style.textAlign="left"),r.style.left=S+"px",r.style.width=A("axisLabelWidth")+"px", +p.appendChild(r),this.xlabels_.push(r)}}h.strokeStyle=i.getOptionForAxis("axisLineColor","x"),h.lineWidth=i.getOptionForAxis("axisLineWidth","x"),h.beginPath();var O;if(i.getOption("drawAxesAtZero")){var P=i.toPercentYCoord(0,0);(P>1||0>P)&&(P=1),O=a(v.y+P*v.h)}else O=a(v.y+v.h);h.moveTo(e(v.x),O),h.lineTo(e(v.x+v.w),O),h.closePath(),h.stroke()}h.restore()}},t}(),Dygraph.Plugins.ChartLabels=function(){"use strict";var t=function(){this.title_div_=null,this.xlabel_div_=null,this.ylabel_div_=null,this.y2label_div_=null};t.prototype.toString=function(){return"ChartLabels Plugin"},t.prototype.activate=function(t){return{layout:this.layout,didDrawChart:this.didDrawChart}};var e=function(t){var e=document.createElement("div");return e.style.position="absolute",e.style.left=t.x+"px",e.style.top=t.y+"px",e.style.width=t.w+"px",e.style.height=t.h+"px",e};t.prototype.detachLabels_=function(){for(var t=[this.title_div_,this.xlabel_div_,this.ylabel_div_,this.y2label_div_],e=0;e=2);for(o=h.yticks,l.save(),n=0;n=2;for(y&&l.installPattern(_),l.strokeStyle=s.getOptionForAxis("gridLineColor","x"),l.lineWidth=s.getOptionForAxis("gridLineWidth","x"),n=0;n/g,">")};return t.prototype.select=function(e){var a=e.selectedX,i=e.selectedPoints,r=e.selectedRow,n=e.dygraph.getOption("legend");if("never"===n)return void(this.legend_div_.style.display="none");if("follow"===n){var o=e.dygraph.plotter_.area,s=e.dygraph.getOption("labelsDivWidth"),l=e.dygraph.getOptionForAxis("axisLabelWidth","y"),h=i[0].x*o.w+20,p=i[0].y*o.h-20;h+s+1>window.scrollX+window.innerWidth&&(h=h-40-s-(l-o.x)),e.dygraph.graphDiv.appendChild(this.legend_div_),this.legend_div_.style.left=l+h+"px",this.legend_div_.style.top=p+"px"}var g=t.generateLegendHTML(e.dygraph,a,i,this.one_em_width_,r);this.legend_div_.innerHTML=g,this.legend_div_.style.display=""},t.prototype.deselect=function(e){var i=e.dygraph.getOption("legend");"always"!==i&&(this.legend_div_.style.display="none");var r=a(this.legend_div_);this.one_em_width_=r;var n=t.generateLegendHTML(e.dygraph,void 0,void 0,r,null);this.legend_div_.innerHTML=n},t.prototype.didDrawChart=function(t){this.deselect(t)},t.prototype.predraw=function(t){if(this.is_generated_div_){t.dygraph.graphDiv.appendChild(this.legend_div_);var e=t.dygraph.plotter_.area,a=t.dygraph.getOption("labelsDivWidth");this.legend_div_.style.left=e.x+e.w-a-1+"px",this.legend_div_.style.top=e.y+"px",this.legend_div_.style.width=a+"px"}},t.prototype.destroy=function(){this.legend_div_=null},t.generateLegendHTML=function(t,a,r,n,o){if(t.getOption("showLabelsOnHighlight")!==!0)return"";var s,l,h,p,g,d=t.getLabels();if("undefined"==typeof a){if("always"!=t.getOption("legend"))return"";for(l=t.getOption("labelsSeparateLines"),s="",h=1;h":" "),g=t.getOption("strokePattern",d[h]),p=e(g,u.color,n),s+=""+p+" "+i(d[h])+"")}return s}var c=t.optionsViewForAxis_("x"),y=c("valueFormatter");s=y.call(t,a,c,d[0],t,o,0),""!==s&&(s+=":");var _=[],v=t.numAxes();for(h=0;v>h;h++)_[h]=t.optionsViewForAxis_("y"+(h?1+h:""));var f=t.getOption("labelsShowZeroValues");l=t.getOption("labelsSeparateLines");var x=t.getHighlightSeries();for(h=0;h");var u=t.getPropertiesForSeries(m.name),D=_[u.axis-1],w=D("valueFormatter"),A=w.call(t,m.yval,D,m.name,t,o,d.indexOf(m.name)),b=m.name==x?" class='highlight'":"";s+=" "+i(m.name)+": "+A+""}}return s},e=function(t,e,a){var i=/MSIE/.test(navigator.userAgent)&&!window.opera;if(i)return"—";if(!t||t.length<=1)return'
      ';var r,n,o,s,l,h=0,p=0,g=[];for(r=0;r<=t.length;r++)h+=t[r%t.length];if(l=Math.floor(a/(h-t[0])),l>1){for(r=0;rn;n++)for(r=0;p>r;r+=2)o=g[r%g.length],s=r';return d},t}(),Dygraph.Plugins.RangeSelector=function(){"use strict";var t=function(){this.isIE_=/MSIE/.test(navigator.userAgent)&&!window.opera,this.hasTouchInterface_="undefined"!=typeof TouchEvent,this.isMobileDevice_=/mobile|android/gi.test(navigator.appVersion),this.interfaceCreated_=!1};return t.prototype.toString=function(){return"RangeSelector Plugin"},t.prototype.activate=function(t){return this.dygraph_=t,this.isUsingExcanvas_=t.isUsingExcanvas_,this.getOption_("showRangeSelector")&&this.createInterface_(),{layout:this.reserveSpace_,predraw:this.renderStaticLayer_,didDrawChart:this.renderInteractiveLayer_}},t.prototype.destroy=function(){this.bgcanvas_=null,this.fgcanvas_=null,this.leftZoomHandle_=null,this.rightZoomHandle_=null,this.iePanOverlay_=null},t.prototype.getOption_=function(t,e){return this.dygraph_.getOption(t,e)},t.prototype.setDefaultOption_=function(t,e){this.dygraph_.attrs_[t]=e},t.prototype.createInterface_=function(){this.createCanvases_(),this.isUsingExcanvas_&&this.createIEPanOverlay_(),this.createZoomHandles_(),this.initInteraction_(),this.getOption_("animatedZooms")&&(console.warn("Animated zooms and range selector are not compatible; disabling animatedZooms."),this.dygraph_.updateOptions({animatedZooms:!1},!0)),this.interfaceCreated_=!0,this.addToGraph_()},t.prototype.addToGraph_=function(){var t=this.graphDiv_=this.dygraph_.graphDiv;t.appendChild(this.bgcanvas_),t.appendChild(this.fgcanvas_),t.appendChild(this.leftZoomHandle_),t.appendChild(this.rightZoomHandle_)},t.prototype.removeFromGraph_=function(){var t=this.graphDiv_;t.removeChild(this.bgcanvas_),t.removeChild(this.fgcanvas_),t.removeChild(this.leftZoomHandle_),t.removeChild(this.rightZoomHandle_),this.graphDiv_=null},t.prototype.reserveSpace_=function(t){this.getOption_("showRangeSelector")&&t.reserveSpaceBottom(this.getOption_("rangeSelectorHeight")+4)},t.prototype.renderStaticLayer_=function(){this.updateVisibility_()&&(this.resize_(),this.drawStaticLayer_())},t.prototype.renderInteractiveLayer_=function(){this.updateVisibility_()&&!this.isChangingRange_&&(this.placeZoomHandles_(),this.drawInteractiveLayer_())},t.prototype.updateVisibility_=function(){var t=this.getOption_("showRangeSelector");if(t)this.interfaceCreated_?this.graphDiv_&&this.graphDiv_.parentNode||this.addToGraph_():this.createInterface_();else if(this.graphDiv_){this.removeFromGraph_();var e=this.dygraph_;setTimeout(function(){e.width_=0,e.resize()},1)}return t},t.prototype.resize_=function(){function t(t,e,a){var i=Dygraph.getContextPixelRatio(e);t.style.top=a.y+"px",t.style.left=a.x+"px",t.width=a.w*i,t.height=a.h*i,t.style.width=a.w+"px",t.style.height=a.h+"px",1!=i&&e.scale(i,i)}var e=this.dygraph_.layout_.getPlotArea(),a=0;this.dygraph_.getOptionForAxis("drawAxis","x")&&(a=this.getOption_("xAxisHeight")||this.getOption_("axisLabelFontSize")+2*this.getOption_("axisTickSize")),this.canvasRect_={x:e.x,y:e.y+e.h+a+4,w:e.w,h:this.getOption_("rangeSelectorHeight")},t(this.bgcanvas_,this.bgcanvas_ctx_,this.canvasRect_),t(this.fgcanvas_,this.fgcanvas_ctx_,this.canvasRect_)},t.prototype.createCanvases_=function(){this.bgcanvas_=Dygraph.createCanvas(),this.bgcanvas_.className="dygraph-rangesel-bgcanvas",this.bgcanvas_.style.position="absolute",this.bgcanvas_.style.zIndex=9,this.bgcanvas_ctx_=Dygraph.getContext(this.bgcanvas_),this.fgcanvas_=Dygraph.createCanvas(),this.fgcanvas_.className="dygraph-rangesel-fgcanvas",this.fgcanvas_.style.position="absolute",this.fgcanvas_.style.zIndex=9,this.fgcanvas_.style.cursor="default",this.fgcanvas_ctx_=Dygraph.getContext(this.fgcanvas_)},t.prototype.createIEPanOverlay_=function(){this.iePanOverlay_=document.createElement("div"),this.iePanOverlay_.style.position="absolute",this.iePanOverlay_.style.backgroundColor="white",this.iePanOverlay_.style.filter="alpha(opacity=0)",this.iePanOverlay_.style.display="none",this.iePanOverlay_.style.cursor="move",this.fgcanvas_.appendChild(this.iePanOverlay_)},t.prototype.createZoomHandles_=function(){var t=new Image;t.className="dygraph-rangesel-zoomhandle",t.style.position="absolute",t.style.zIndex=10,t.style.visibility="hidden",t.style.cursor="col-resize",/MSIE 7/.test(navigator.userAgent)?(t.width=7,t.height=14,t.style.backgroundColor="white",t.style.border="1px solid #333333"):(t.width=9,t.height=16,t.src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAkAAAAQCAYAAADESFVDAAAAAXNSR0IArs4c6QAAAAZiS0dEANAAzwDP4Z7KegAAAAlwSFlzAAAOxAAADsQBlSsOGwAAAAd0SU1FB9sHGw0cMqdt1UwAAAAZdEVYdENvbW1lbnQAQ3JlYXRlZCB3aXRoIEdJTVBXgQ4XAAAAaElEQVQoz+3SsRFAQBCF4Z9WJM8KCDVwownl6YXsTmCUsyKGkZzcl7zkz3YLkypgAnreFmDEpHkIwVOMfpdi9CEEN2nGpFdwD03yEqDtOgCaun7sqSTDH32I1pQA2Pb9sZecAxc5r3IAb21d6878xsAAAAAASUVORK5CYII="),this.isMobileDevice_&&(t.width*=2,t.height*=2),this.leftZoomHandle_=t,this.rightZoomHandle_=t.cloneNode(!1)},t.prototype.initInteraction_=function(){var t,e,a,i,r,n,o,s,l,h,p,g,d,u,c=this,y=document,_=0,v=null,f=!1,x=!1,m=!this.isMobileDevice_&&!this.isUsingExcanvas_,D=new Dygraph.IFrameTarp;t=function(t){var e=c.dygraph_.xAxisExtremes(),a=(e[1]-e[0])/c.canvasRect_.w,i=e[0]+(t.leftHandlePos-c.canvasRect_.x)*a,r=e[0]+(t.rightHandlePos-c.canvasRect_.x)*a;return[i,r]},e=function(t){return Dygraph.cancelEvent(t),f=!0,_=t.clientX,v=t.target?t.target:t.srcElement,("mousedown"===t.type||"dragstart"===t.type)&&(Dygraph.addEvent(y,"mousemove",a),Dygraph.addEvent(y,"mouseup",i)),c.fgcanvas_.style.cursor="col-resize",D.cover(),!0},a=function(t){if(!f)return!1;Dygraph.cancelEvent(t);var e=t.clientX-_;if(Math.abs(e)<4)return!0;_=t.clientX;var a,i=c.getZoomHandleStatus_();v==c.leftZoomHandle_?(a=i.leftHandlePos+e,a=Math.min(a,i.rightHandlePos-v.width-3),a=Math.max(a,c.canvasRect_.x)):(a=i.rightHandlePos+e,a=Math.min(a,c.canvasRect_.x+c.canvasRect_.w),a=Math.max(a,i.leftHandlePos+v.width+3));var n=v.width/2;return v.style.left=a-n+"px",c.drawInteractiveLayer_(),m&&r(),!0},i=function(t){return f?(f=!1,D.uncover(),Dygraph.removeEvent(y,"mousemove",a),Dygraph.removeEvent(y,"mouseup",i),c.fgcanvas_.style.cursor="default",m||r(),!0):!1},r=function(){try{var e=c.getZoomHandleStatus_();if(c.isChangingRange_=!0,e.isZoomed){var a=t(e);c.dygraph_.doZoomXDates_(a[0],a[1])}else c.dygraph_.resetZoom()}finally{c.isChangingRange_=!1}},n=function(t){if(c.isUsingExcanvas_)return t.srcElement==c.iePanOverlay_;var e=c.leftZoomHandle_.getBoundingClientRect(),a=e.left+e.width/2;e=c.rightZoomHandle_.getBoundingClientRect();var i=e.left+e.width/2;return t.clientX>a&&t.clientX=c.canvasRect_.x+c.canvasRect_.w?(r=c.canvasRect_.x+c.canvasRect_.w,i=r-n):(i+=e,r+=e);var o=c.leftZoomHandle_.width/2;return c.leftZoomHandle_.style.left=i-o+"px",c.rightZoomHandle_.style.left=r-o+"px",c.drawInteractiveLayer_(),m&&h(),!0},l=function(t){return x?(x=!1,Dygraph.removeEvent(y,"mousemove",s),Dygraph.removeEvent(y,"mouseup",l),m||h(),!0):!1},h=function(){try{c.isChangingRange_=!0,c.dygraph_.dateWindow_=t(c.getZoomHandleStatus_()),c.dygraph_.drawGraph_(!1)}finally{c.isChangingRange_=!1}},p=function(t){if(!f&&!x){var e=n(t)?"move":"default";e!=c.fgcanvas_.style.cursor&&(c.fgcanvas_.style.cursor=e)}},g=function(t){"touchstart"==t.type&&1==t.targetTouches.length?e(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?a(t.targetTouches[0])&&Dygraph.cancelEvent(t):i(t)},d=function(t){"touchstart"==t.type&&1==t.targetTouches.length?o(t.targetTouches[0])&&Dygraph.cancelEvent(t):"touchmove"==t.type&&1==t.targetTouches.length?s(t.targetTouches[0])&&Dygraph.cancelEvent(t):l(t)},u=function(t,e){for(var a=["touchstart","touchend","touchmove","touchcancel"],i=0;it;t++){var s=this.getOption_("showInRangeSelector",r[t]);n[t]=s,null!==s&&(o=!0)}if(!o)for(t=0;t1&&(g=h.rollingAverage(g,e.rollPeriod(),p)),l.push(g)}var d=[];for(t=0;t0)&&(v=Math.min(v,x),f=Math.max(f,x))}var m=.25;if(a)for(f=Dygraph.log10(f),f+=f*m,v=Dygraph.log10(v),t=0;tthis.canvasRect_.x||a+10&&t[r][0]>o;)i--,r--}return i>=a?[a,i]:[0,t.length-1]},t.parseFloat=function(t){return null===t?0/0:t}}(),function(){"use strict";Dygraph.DataHandlers.DefaultHandler=function(){};var t=Dygraph.DataHandlers.DefaultHandler;t.prototype=new Dygraph.DataHandler,t.prototype.extractSeries=function(t,e,a){for(var i=[],r=a.get("logscale"),n=0;n=s&&(s=null),i.push([o,s])}return i},t.prototype.rollingAverage=function(t,e,a){e=Math.min(e,t.length);var i,r,n,o,s,l=[];if(1==e)return t;for(i=0;ir;r++)n=t[r][1],null===n||isNaN(n)||(s++,o+=t[r][1]);s?l[i]=[t[i][0],o/s]:l[i]=[t[i][0],null]}return l},t.prototype.getExtremeYValues=function(t,e,a){for(var i,r=null,n=null,o=0,s=t.length-1,l=o;s>=l;l++)i=t[l][1],null===i||isNaN(i)||((null===n||i>n)&&(n=i),(null===r||r>i)&&(r=i));return[r,n]}}(),function(){"use strict";Dygraph.DataHandlers.DefaultFractionHandler=function(){};var t=Dygraph.DataHandlers.DefaultFractionHandler;t.prototype=new Dygraph.DataHandlers.DefaultHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h=[],p=100,g=a.get("logscale"),d=0;d=0&&(n-=t[i-e][2][0],o-=t[i-e][2][1]);var l=t[i][0],h=o?n/o:0;r[i]=[l,s*h]}return r}}(),function(){"use strict";Dygraph.DataHandlers.BarsHandler=function(){Dygraph.DataHandler.call(this)},Dygraph.DataHandlers.BarsHandler.prototype=new Dygraph.DataHandler;var t=Dygraph.DataHandlers.BarsHandler;t.prototype.extractSeries=function(t,e,a){},t.prototype.rollingAverage=function(t,e,a){},t.prototype.onPointsCreated_=function(t,e){for(var a=0;a=l;l++)if(i=t[l][1],null!==i&&!isNaN(i)){var h=t[l][2][0],p=t[l][2][1];h>i&&(h=i),i>p&&(p=i),(null===n||p>n)&&(n=p),(null===r||r>h)&&(r=h)}return[r,n]},t.prototype.onLineEvaluated=function(t,e,a){for(var i,r=0;r=0){var g=t[l-e];null===g[1]||isNaN(g[1])||(r-=g[2][0],o-=g[1],n-=g[2][1],s-=1)}s?p[l]=[t[l][0],1*o/s,[1*r/s,1*n/s]]:p[l]=[t[l][0],null,[null,null]]}return p}}(),function(){"use strict";Dygraph.DataHandlers.ErrorBarsHandler=function(){};var t=Dygraph.DataHandlers.ErrorBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s=[],l=a.get("sigma"),h=a.get("logscale"),p=0;pr;r++)n=t[r][1],null===n||isNaN(n)||(l++,s+=n,p+=Math.pow(t[r][2][2],2));l?(h=Math.sqrt(p)/l,g=s/l,d[i]=[t[i][0],g,[g-u*h,g+u*h]]):(o=1==e?t[i][1]:null,d[i]=[t[i][0],o,[o,o]])}return d}}(),function(){"use strict";Dygraph.DataHandlers.FractionsBarsHandler=function(){};var t=Dygraph.DataHandlers.FractionsBarsHandler;t.prototype=new Dygraph.DataHandlers.BarsHandler,t.prototype.extractSeries=function(t,e,a){for(var i,r,n,o,s,l,h,p,g=[],d=100,u=a.get("sigma"),c=a.get("logscale"),y=0;y=0&&(p-=t[n-e][2][2],g-=t[n-e][2][3]);var u=t[n][0],c=g?p/g:0;if(h)if(g){var y=0>c?0:c,_=g,v=l*Math.sqrt(y*(1-y)/_+l*l/(4*_*_)),f=1+l*l/g;i=(y+l*l/(2*g)-v)/f,r=(y+l*l/(2*g)+v)/f,s[n]=[u,y*d,[i*d,r*d]]}else s[n]=[u,0,[0,0]];else o=g?l*Math.sqrt(c*(1-c)/g):1,s[n]=[u,d*c,[d*(c-o),d*(c+o)]]}return s}}(); +//# sourceMappingURL=dygraph-combined.js.map \ No newline at end of file diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css new file mode 100644 index 000000000..4745b2fc2 --- /dev/null +++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/dygraph.css @@ -0,0 +1,8 @@ + +div .dygraphs input[type="text"] { + width: 25px; +} + +div .qt .dygraph-axis-label { + font-size: 11px; +} \ No newline at end of file diff --git a/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js new file mode 100644 index 000000000..2df07a9b8 --- /dev/null +++ b/previous_versions/v0.4.0/libs/dygraphs-1.1.1/shapes.js @@ -0,0 +1,123 @@ +/** + * @license + * Copyright 2011 Dan Vanderkam (danvdk@gmail.com) + * MIT-licensed (http://opensource.org/licenses/MIT) + */ + +/** + * @fileoverview + * Including this file will add several additional shapes to Dygraph.Circles + * which can be passed to drawPointCallback. + * See tests/custom-circles.html for usage. + */ + +(function() { + +/** + * @param {!CanvasRenderingContext2D} ctx the canvas context + * @param {number} sides the number of sides in the shape. + * @param {number} radius the radius of the image. + * @param {number} cx center x coordate + * @param {number} cy center y coordinate + * @param {number=} rotationRadians the shift of the initial angle, in radians. + * @param {number=} delta the angle shift for each line. If missing, creates a + * regular polygon. + */ +var regularShape = function( + ctx, sides, radius, cx, cy, rotationRadians, delta) { + rotationRadians = rotationRadians || 0; + delta = delta || Math.PI * 2 / sides; + + ctx.beginPath(); + var initialAngle = rotationRadians; + var angle = initialAngle; + + var computeCoordinates = function() { + var x = cx + (Math.sin(angle) * radius); + var y = cy + (-Math.cos(angle) * radius); + return [x, y]; + }; + + var initialCoordinates = computeCoordinates(); + var x = initialCoordinates[0]; + var y = initialCoordinates[1]; + ctx.moveTo(x, y); + + for (var idx = 0; idx < sides; idx++) { + angle = (idx == sides - 1) ? initialAngle : (angle + delta); + var coords = computeCoordinates(); + ctx.lineTo(coords[0], coords[1]); + } + ctx.fill(); + ctx.stroke(); +}; + +/** + * TODO(danvk): be more specific on the return type. + * @param {number} sides + * @param {number=} rotationRadians + * @param {number=} delta + * @return {Function} + * @private + */ +var shapeFunction = function(sides, rotationRadians, delta) { + return function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + ctx.fillStyle = "white"; + regularShape(ctx, sides, radius, cx, cy, rotationRadians, delta); + }; +}; + +var customCircles = { + TRIANGLE : shapeFunction(3), + SQUARE : shapeFunction(4, Math.PI / 4), + DIAMOND : shapeFunction(4), + PENTAGON : shapeFunction(5), + HEXAGON : shapeFunction(6), + CIRCLE : function(g, name, ctx, cx, cy, color, radius) { + ctx.beginPath(); + ctx.strokeStyle = color; + ctx.fillStyle = "white"; + ctx.arc(cx, cy, radius, 0, 2 * Math.PI, false); + ctx.fill(); + ctx.stroke(); + }, + STAR : shapeFunction(5, 0, 4 * Math.PI / 5), + PLUS : function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy); + ctx.lineTo(cx - radius, cy); + ctx.closePath(); + ctx.stroke(); + + ctx.beginPath(); + ctx.moveTo(cx, cy + radius); + ctx.lineTo(cx, cy - radius); + ctx.closePath(); + ctx.stroke(); + }, + EX : function(g, name, ctx, cx, cy, color, radius) { + ctx.strokeStyle = color; + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy + radius); + ctx.lineTo(cx - radius, cy - radius); + ctx.closePath(); + ctx.stroke(); + + ctx.beginPath(); + ctx.moveTo(cx + radius, cy - radius); + ctx.lineTo(cx - radius, cy + radius); + ctx.closePath(); + ctx.stroke(); + } +}; + +for (var k in customCircles) { + if (!customCircles.hasOwnProperty(k)) continue; + Dygraph.Circles[k] = customCircles[k]; +} + +})(); diff --git a/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js b/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js new file mode 100644 index 000000000..3cd03913f --- /dev/null +++ b/previous_versions/v0.4.0/libs/dygraphs-binding-1.1.1.6/dygraphs.js @@ -0,0 +1,789 @@ + +// polyfill indexOf for IE8 +if (!Array.prototype.indexOf) { + Array.prototype.indexOf = function(elt /*, from*/) { + var len = this.length >>> 0; + + var from = Number(arguments[1]) || 0; + from = (from < 0) + ? Math.ceil(from) + : Math.floor(from); + if (from < 0) + from += len; + + for (; from < len; from++) { + if (from in this && + this[from] === elt) + return from; + } + return -1; + }; +} + +HTMLWidgets.widget({ + + name: "dygraphs", + + type: "output", + + factory: function(el, width, height) { + + // reference to dygraph + var dygraph = null; + + // reference to widget global groups + var groups = this.groups; + + // add qt style if we are running under Qt + if (window.navigator.userAgent.indexOf(" Qt/") > 0) + el.className += " qt"; + + return { + + renderValue: function(x) { + + // reference to this for closures + var thiz = this; + + // get dygraph attrs and populate file field + var attrs = x.attrs; + attrs.file = x.data; + + // disable zoom interaction except for clicks + if (attrs.disableZoom) { + attrs.interactionModel = Dygraph.Interaction.nonInteractiveModel_; + } + + // convert non-arrays to arrays + for (var index = 0; index < attrs.file.length; index++) { + if (!$.isArray(attrs.file[index])) + attrs.file[index] = [].concat(attrs.file[index]); + } + + // resolve "auto" legend behavior + if (x.attrs.legend == "auto") { + if (x.data.length <= 2) + x.attrs.legend = "onmouseover"; + else + x.attrs.legend = "always"; + } + + if (x.format == "date") { + + // set appropriated function in case of fixed tz + if ((attrs.axes.x.axisLabelFormatter === undefined) && x.fixedtz) + attrs.axes.x.axisLabelFormatter = this.xAxisLabelFormatterFixedTZ(x.tzone); + + if ((attrs.axes.x.valueFormatter === undefined) && x.fixedtz) + attrs.axes.x.valueFormatter = this.xValueFormatterFixedTZ(x.scale, x.tzone); + + if ((attrs.axes.x.ticker === undefined) && x.fixedtz) + attrs.axes.x.ticker = this.customDateTickerFixedTZ(x.tzone); + + // provide an automatic x value formatter if none is already specified + if ((attrs.axes.x.valueFormatter === undefined) && (x.fixedtz != true)) + attrs.axes.x.valueFormatter = this.xValueFormatter(x.scale); + + // convert time to js time + attrs.file[0] = attrs.file[0].map(function(value) { + return thiz.normalizeDateValue(x.scale, value, x.fixedtz); + }); + if (attrs.dateWindow != null) { + attrs.dateWindow = attrs.dateWindow.map(function(value) { + var date = thiz.normalizeDateValue(x.scale, value, x.fixedtz); + return date.getTime(); + }); + } + } + + + // transpose array + attrs.file = HTMLWidgets.transposeArray2D(attrs.file); + + // add drawCallback for group + if (x.group != null) + this.addGroupDrawCallback(x); + + // add shading and event callback if necessary + this.addShadingCallback(x); + this.addEventCallback(x); + this.addZoomCallback(x); + + // disable y-axis touch events on mobile phones + if (attrs.mobileDisableYTouch !== false && this.isMobilePhone()) { + // create default interaction model if necessary + if (!attrs.interactionModel) + attrs.interactionModel = Dygraph.Interaction.defaultModel; + // disable y touch direction + attrs.interactionModel.touchstart = function(event, dygraph, context) { + Dygraph.defaultInteractionModel.touchstart(event, dygraph, context); + context.touchDirections = { x: true, y: false }; + }; + } + + // create plugins + if (x.plugins) { + attrs.plugins = []; + for (var plugin in x.plugins) { + if (x.plugins.hasOwnProperty(plugin)) { + + // get plugin options + var options = x.plugins[plugin]; + + // create plugin and add to dygraph + var p = new Dygraph.Plugins[plugin](options); + attrs.plugins.push(p); + } + } + } + + // custom plotter + if (x.plotter) { + attrs.plotter = Dygraph.Plotters[x.plotter]; + } + + // custom data handler + if (x.dataHandler) { + attrs.dataHandler = Dygraph.DataHandlers[x.dataHandler]; + } + + // custom circles + if (x.pointShape) { + if (typeof x.pointShape === 'string') { + attrs.drawPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()]; + attrs.drawHighlightPointCallback = Dygraph.Circles[x.pointShape.toUpperCase()]; + } else { + for (var s in x.pointShape) { + if (x.pointShape.hasOwnProperty(s)) { + attrs.series[s].drawPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()]; + attrs.series[s].drawHighlightPointCallback = Dygraph.Circles[x.pointShape[s].toUpperCase()]; + } + } + } + } + + // if there is no existing dygraph perform initialization + if (!dygraph) { + + // subscribe to custom shown event (fired by ioslides to trigger + // shiny reactivity but we can use it as well). this is necessary + // because if a dygraph starts out as display:none it has height + // and width == 0 and this doesn't change when it becomes visible + $(el).closest('slide').on('shown', function() { + if (dygraph) + dygraph.resize(); + }); + + // do the same for reveal.js + $(el).closest('section.slide').on('shown', function() { + if (dygraph) + dygraph.resize(); + }); + + // redraw on R Markdown {.tabset} tab visibility changed + var tab = $(el).closest('div.tab-pane'); + if (tab !== null) { + var tabID = tab.attr('id'); + var tabAnchor = $('a[data-toggle="tab"][href="#' + tabID + '"]'); + if (tabAnchor !== null) { + tabAnchor.on('shown.bs.tab', function() { + if (dygraph) + dygraph.resize(); + }); + } + } + // add default font for viewer mode + if (this.queryVar("viewer_pane") === "1") + document.body.style.fontFamily = "Arial, sans-serif"; + + // inject css if necessary + if (x.css != null) { + var style = document.createElement('style'); + style.type = 'text/css'; + if (style.styleSheet) + style.styleSheet.cssText = x.css; + else + style.appendChild(document.createTextNode(x.css)); + document.getElementsByTagName("head")[0].appendChild(style); + } + + } else { + + // retain the userDateWindow if requested + if (dygraph.userDateWindow != null + && attrs.retainDateWindow == true) { + attrs.dateWindow = dygraph.xAxisRange(); + } + + // remove it from groups if it's there + if (x.group != null && groups[x.group] != null) { + var index = groups[x.group].indexOf(dygraph); + if (index != -1) + groups[x.group].splice(index, 1); + } + + // destroy the existing dygraph + dygraph.destroy(); + dygraph = null; + } + + // create the dygraph and add it to it's group (if any) + dygraph = thiz.dygraph = new Dygraph(el, attrs.file, attrs); + dygraph.userDateWindow = attrs.dateWindow; + if (x.group != null) + groups[x.group].push(dygraph); + + // add shiny inputs for date window and click + if (HTMLWidgets.shinyMode) { + var isDate = x.format == "date"; + this.addClickShinyInput(el.id, isDate); + this.addDateWindowShinyInput(el.id, isDate); + } + + // set annotations + if (x.annotations != null) { + dygraph.ready(function() { + if (x.format == "date") { + x.annotations.map(function(annotation) { + var date = thiz.normalizeDateValue(x.scale, annotation.x, x.fixedtz); + annotation.x = date.getTime(); + }); + } + dygraph.setAnnotations(x.annotations); + }); + } + + }, + + customDateTickerFixedTZ : function(tz){ + return function(t,e,a,i,r) { + var a=Dygraph.pickDateTickGranularity(t,e,a,i); + if(a >= 0){ + + var n=i("axisLabelFormatter"), + o=i("labelsUTC"), + s=o?Dygraph.DateAccessorsUTC:Dygraph.DateAccessorsLocal; + l=Dygraph.TICK_PLACEMENT[a].datefield; + h=Dygraph.TICK_PLACEMENT[a].step; + p=Dygraph.TICK_PLACEMENT[a].spacing; + + var y = []; + var d = moment(t); + d.tz(tz); + d.millisecond(0); + + if(l > Dygraph.DATEFIELD_M){ + var x; + if (l === Dygraph.DATEFIELD_SS) { // seconds + x = d.second(); + d.second(x - x % h); + } else if(l === Dygraph.DATEFIELD_MM){ + d.second(0) + x = d.minute(); + d.minute(x - x % h); + } else if(l === Dygraph.DATEFIELD_HH){ + d.second(0); + d.minute(0); + x = d.hour(); + d.hour(x - x % h); + } else if(l === Dygraph.DATEFIELD_D){ + d.second(0); + d.minute(0); + d.hour(0); + if (h == 7) { // one week + d.startOf('week'); + } + } + + v = d.valueOf(); + _=moment(v).tz(tz); + + // For spacings coarser than two-hourly, we want to ignore daylight + // savings transitions to get consistent ticks. For finer-grained ticks, + // it's essential to show the DST transition in all its messiness. + var start_offset_min = moment(v).tz(tz).zone(); + var check_dst = (p >= Dygraph.TICK_PLACEMENT[Dygraph.TWO_HOURLY].spacing); + + if(a<=Dygraph.HOURLY){ + for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){ + y.push({v:v,label:n(_,a,i,r)}); + v+=p; + _=moment(v).tz(tz); + } + }else{ + for(t>v&&(v+=p,_=moment(v).tz(tz));e>=v;){ + + // This ensures that we stay on the same hourly "rhythm" across + // daylight savings transitions. Without this, the ticks could get off + // by an hour. See tests/daylight-savings.html or issue 147. + if (check_dst && _.zone() != start_offset_min) { + var delta_min = _.zone() - start_offset_min; + v += delta_min * 60 * 1000; + _= moment(v).tz(tz); + start_offset_min = _.zone(); + + // Check whether we've backed into the previous timezone again. + // This can happen during a "spring forward" transition. In this case, + // it's best to skip this tick altogether (we may be shooting for a + // non-existent time like the 2AM that's skipped) and go to the next + // one. + if (moment(v + p).tz(tz).zone() != start_offset_min) { + v += p; + _= moment(v).tz(tz); + start_offset_min = _.zone(); + } + } + + (a>=Dygraph.DAILY||_.get('hour')%h===0)&&y.push({v:v,label:n(_,a,i,r)}); + v+=p; + _=moment(v).tz(tz); + } + } + }else{ + var start_year = moment(t).tz(tz).year(); + var end_year = moment(e).tz(tz).year(); + var start_month = moment(t).tz(tz).month(); + + if(l === Dygraph.DATEFIELD_M){ + var step_month = h; + for (var ii = start_year; ii <= end_year; ii++) { + for (var j = 0; j < 12;) { + var dt = moment(new Date(ii, j, 1)).tz(tz); + // fix some tz bug + dt.year(ii); + dt.month(j); + dt.date(1); + dt.hour(0); + v = dt.valueOf(); + y.push({v:v,label:n(moment(v).tz(tz),a,i,r)}); + j+=step_month; + } + } + }else{ + var step_year = h; + for (var ii = start_year; ii <= end_year;) { + var dt = moment(new Date(ii, 1, 1)).tz(tz); + // fix some tz bug + dt.year(ii); + dt.month(j); + dt.date(1); + dt.hour(0); + v = dt.valueOf(); + y.push({v:v,label:n(moment(v).tz(tz),a,i,r)}); + ii+=step_year; + } + } + } + return y; + }else{ + return []; + } + }; + }, + + xAxisLabelFormatterFixedTZ : function(tz){ + + return function dateAxisFormatter(date, granularity){ + var mmnt = moment(date).tz(tz); + if (granularity >= Dygraph.DECADAL){ + return mmnt.format('YYYY'); + }else{ + if(granularity >= Dygraph.MONTHLY){ + return mmnt.format('MMM YYYY'); + }else{ + var frac = mmnt.hour() * 3600 + mmnt.minute() * 60 + mmnt.second() + mmnt.millisecond(); + if (frac === 0 || granularity >= Dygraph.DAILY) { + return mmnt.format('DD MMM'); + } else { + if (mmnt.second()) { + return mmnt.format('HH:mm:ss'); + } else { + return mmnt.format('HH:mm'); + } + } + } + + } + } + }, + + xValueFormatterFixedTZ: function(scale, tz) { + + return function(millis) { + var mmnt = moment(millis).tz(tz); + if (scale == "yearly") + return mmnt.format('YYYY') + ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "quarterly") + return mmnt.fquarter(1) + ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "monthly") + return mmnt.format('MMM, YYYY')+ ' (' + mmnt.zoneAbbr() + ')'; + else if (scale == "daily" || scale == "weekly") + return mmnt.format('MMM, DD, YYYY')+ ' (' + mmnt.zoneAbbr() + ')'; + else + return mmnt.format('dddd, MMMM DD, YYYY HH:mm:ss')+ ' (' + mmnt.zoneAbbr() + ')'; + } + }, + + xValueFormatter: function(scale) { + + var monthNames = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", + "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"]; + + return function(millis) { + var date = new Date(millis); + if (scale == "yearly") + return date.getFullYear(); + else if (scale == "quarterly") + return moment(millis).fquarter(1); + else if (scale == "monthly") + return monthNames[date.getMonth()] + ', ' + date.getFullYear(); + else if (scale == "daily" || scale == "weekly") + return monthNames[date.getMonth()] + ', ' + + date.getDate() + ', ' + + date.getFullYear(); + else + return date.toLocaleString(); + } + }, + + addZoomCallback: function(x) { + + // alias this + var thiz = this; + + // get attrs + var attrs = x.attrs; + + // check for an existing zoomCallback + var prevZoomCallback = attrs["zoomCallback"]; + + attrs.zoomCallback = function(minDate, maxDate, yRanges) { + + // call existing + if (prevZoomCallback) + prevZoomCallback(minDate, maxDate, yRanges); + + // record user date window (or lack thereof) + if (dygraph.xAxisExtremes()[0] != minDate || + dygraph.xAxisExtremes()[1] != maxDate) { + dygraph.userDateWindow = [minDate, maxDate]; + } else { + dygraph.userDateWindow = null; + } + + // record in group if necessary + if (x.group != null && groups[x.group] != null) { + var group = groups[x.group]; + for(var i = 0; i=0.1){ + var dashLength = dashArray[dashIndex++%dashCount]; + if (dashLength > distRemaining) dashLength = distRemaining; + var xStep = Math.sqrt( dashLength*dashLength / (1 + slope*slope) ); + if (dx<0) xStep = -xStep; + x += xStep + y += slope*xStep; + canvas[draw ? 'lineTo' : 'moveTo'](x,y); + distRemaining -= dashLength; + draw = !draw; + } + canvas.stroke(); + }, + + setFontSize: function(canvas, size) { + var cFont = canvas.font; + var parts = cFont.split(' '); + if (parts.length === 2) + canvas.font = size + 'px ' + parts[1]; + else if (parts.length === 3) + canvas.font = parts[0] + ' ' + size + 'px ' + parts[2]; + }, + + // Returns the value of a GET variable + queryVar: function(name) { + return decodeURI(window.location.search.replace( + new RegExp("^(?:.*[&\\?]" + + encodeURI(name).replace(/[\.\+\*]/g, "\\$&") + + "(?:\\=([^&]*))?)?.*$", "i"), + "$1")); + }, + + // We deal exclusively in UTC dates within R, however dygraphs deals + // exclusively in the local time zone. Therefore, in order to plot date + // labels that make sense to the user when we are dealing with days, + // months or years we need to convert the UTC date value to a local time + // value that "looks like" the equivilant UTC value. To do this we add the + // timezone offset to the UTC date. + // Don't use in case of fixedtz + normalizeDateValue: function(scale, value, fixedtz) { + var date = new Date(value); + if (scale != "minute" && scale != "hourly" && scale != "seconds" && !fixedtz) { + var localAsUTC = date.getTime() + (date.getTimezoneOffset() * 60000); + date = new Date(localAsUTC); + } + return date; + }, + + // safely detect rendering on a mobile phone + isMobilePhone: function() { + try + { + return ! window.matchMedia("only screen and (min-width: 768px)").matches; + } + catch(e) { + return false; + } + }, + + + resize: function(width, height) { + if (dygraph) + dygraph.resize(); + }, + + // export dygraph so other code can get a hold of it + dygraph: null + + }; + }, + + // track groups globally + groups: {} + +}); + diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf new file mode 100644 index 000000000..35acda2fa Binary files /dev/null and b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/fontawesome/fontawesome-webfont.ttf differ diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css new file mode 100644 index 000000000..8e5bb8a3c --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-bookdown.css @@ -0,0 +1,99 @@ +.book .book-header h1 { + padding-left: 20px; + padding-right: 20px; +} +.book .book-header.fixed { + position: fixed; + right: 0; + top: 0; + left: 0; + border-bottom: 1px solid rgba(0,0,0,.07); +} +span.search-highlight { + background-color: #ffff88; +} +@media (min-width: 600px) { + .book.with-summary .book-header.fixed { + left: 300px; + } +} +@media (max-width: 1240px) { + .book .book-body.fixed { + top: 50px; + } + .book .book-body.fixed .body-inner { + top: auto; + } +} +@media (max-width: 600px) { + .book.with-summary .book-header.fixed { + left: calc(100% - 60px); + min-width: 300px; + } + .book.with-summary .book-body { + transform: none; + left: calc(100% - 60px); + min-width: 300px; + } + .book .book-body.fixed { + top: 0; + } +} + +.book .book-body.fixed .body-inner { + top: 50px; +} +.book .book-body .page-wrapper .page-inner section.normal sub, .book .book-body .page-wrapper .page-inner section.normal sup { + font-size: 85%; +} + +@media print { + .book .book-summary, .book .book-body .book-header, .fa { + display: none !important; + } + .book .book-body.fixed { + left: 0px; + } + .book .book-body,.book .book-body .body-inner, .book.with-summary { + overflow: visible !important; + } +} +.kable_wrapper { + border-spacing: 20px 0; + border-collapse: separate; + border: none; + margin: auto; +} +.kable_wrapper > tbody > tr > td { + vertical-align: top; +} +.book .book-body .page-wrapper .page-inner section.normal table tr.header { + border-top-width: 2px; +} +.book .book-body .page-wrapper .page-inner section.normal table tr:last-child td { + border-bottom-width: 2px; +} +.book .book-body .page-wrapper .page-inner section.normal table td, .book .book-body .page-wrapper .page-inner section.normal table th { + border-left: none; + border-right: none; +} +.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr, .book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr > td { + border-top: none; +} +.book .book-body .page-wrapper .page-inner section.normal table.kable_wrapper > tbody > tr:last-child > td { + border-bottom: none; +} + +div.theorem, div.lemma, div.corollary, div.proposition, div.conjecture { + font-style: italic; +} +span.theorem, span.lemma, span.corollary, span.proposition, span.conjecture { + font-style: normal; +} +div.proof:after { + content: "\25a2"; + float: right; +} +.header-section-number { + padding-right: .5em; +} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css new file mode 100644 index 000000000..87236b4c0 --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-fontsettings.css @@ -0,0 +1,292 @@ +/* + * Theme 1 + */ +.color-theme-1 .dropdown-menu { + background-color: #111111; + border-color: #7e888b; +} +.color-theme-1 .dropdown-menu .dropdown-caret .caret-inner { + border-bottom: 9px solid #111111; +} +.color-theme-1 .dropdown-menu .buttons { + border-color: #7e888b; +} +.color-theme-1 .dropdown-menu .button { + color: #afa790; +} +.color-theme-1 .dropdown-menu .button:hover { + color: #73553c; +} +/* + * Theme 2 + */ +.color-theme-2 .dropdown-menu { + background-color: #2d3143; + border-color: #272a3a; +} +.color-theme-2 .dropdown-menu .dropdown-caret .caret-inner { + border-bottom: 9px solid #2d3143; +} +.color-theme-2 .dropdown-menu .buttons { + border-color: #272a3a; +} +.color-theme-2 .dropdown-menu .button { + color: #62677f; +} +.color-theme-2 .dropdown-menu .button:hover { + color: #f4f4f5; +} +.book .book-header .font-settings .font-enlarge { + line-height: 30px; + font-size: 1.4em; +} +.book .book-header .font-settings .font-reduce { + line-height: 30px; + font-size: 1em; +} +.book.color-theme-1 .book-body { + color: #704214; + background: #f3eacb; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section { + background: #f3eacb; +} +.book.color-theme-2 .book-body { + color: #bdcadb; + background: #1c1f2b; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section { + background: #1c1f2b; +} +.book.font-size-0 .book-body .page-inner section { + font-size: 1.2rem; +} +.book.font-size-1 .book-body .page-inner section { + font-size: 1.4rem; +} +.book.font-size-2 .book-body .page-inner section { + font-size: 1.6rem; +} +.book.font-size-3 .book-body .page-inner section { + font-size: 2.2rem; +} +.book.font-size-4 .book-body .page-inner section { + font-size: 4rem; +} +.book.font-family-0 { + font-family: Georgia, serif; +} +.book.font-family-1 { + font-family: "Helvetica Neue", Helvetica, Arial, sans-serif; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal { + color: #704214; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal a { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h3, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h4, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h5, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h2 { + border-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal h6 { + color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal hr { + background-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal blockquote { + border-color: #c4b29f; + opacity: 0.9; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code { + background: #fdf6e3; + color: #657b83; + border-color: #f8df9c; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal .highlight { + background-color: inherit; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table th, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table td { + border-color: #f5d06c; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr { + color: inherit; + background-color: #fdf6e3; + border-color: #444444; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) { + background-color: #fbeecb; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal { + color: #bdcadb; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal a { + color: #3eb1d0; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h3, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h4, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h5, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 { + color: #fffffa; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h1, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h2 { + border-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal h6 { + color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal hr { + background-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal blockquote { + border-color: #373b4e; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code { + color: #9dbed8; + background: #2d3143; + border-color: #2d3143; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal .highlight { + background-color: #282a39; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table th, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table td { + border-color: #3b3f54; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr { + color: #b6c2d2; + background-color: #2d3143; + border-color: #3b3f54; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n) { + background-color: #35394b; +} +.book.color-theme-1 .book-header { + color: #afa790; + background: transparent; +} +.book.color-theme-1 .book-header .btn { + color: #afa790; +} +.book.color-theme-1 .book-header .btn:hover { + color: #73553c; + background: none; +} +.book.color-theme-1 .book-header h1 { + color: #704214; +} +.book.color-theme-2 .book-header { + color: #7e888b; + background: transparent; +} +.book.color-theme-2 .book-header .btn { + color: #3b3f54; +} +.book.color-theme-2 .book-header .btn:hover { + color: #fffff5; + background: none; +} +.book.color-theme-2 .book-header h1 { + color: #bdcadb; +} +.book.color-theme-1 .book-body .navigation { + color: #afa790; +} +.book.color-theme-1 .book-body .navigation:hover { + color: #73553c; +} +.book.color-theme-2 .book-body .navigation { + color: #383f52; +} +.book.color-theme-2 .book-body .navigation:hover { + color: #fffff5; +} +/* + * Theme 1 + */ +.book.color-theme-1 .book-summary { + color: #afa790; + background: #111111; + border-right: 1px solid rgba(0, 0, 0, 0.07); +} +.book.color-theme-1 .book-summary .book-search { + background: transparent; +} +.book.color-theme-1 .book-summary .book-search input, +.book.color-theme-1 .book-summary .book-search input:focus { + border: 1px solid transparent; +} +.book.color-theme-1 .book-summary ul.summary li.divider { + background: #7e888b; + box-shadow: none; +} +.book.color-theme-1 .book-summary ul.summary li i.fa-check { + color: #33cc33; +} +.book.color-theme-1 .book-summary ul.summary li.done > a { + color: #877f6a; +} +.book.color-theme-1 .book-summary ul.summary li a, +.book.color-theme-1 .book-summary ul.summary li span { + color: #877f6a; + background: transparent; + font-weight: normal; +} +.book.color-theme-1 .book-summary ul.summary li.active > a, +.book.color-theme-1 .book-summary ul.summary li a:hover { + color: #704214; + background: transparent; + font-weight: normal; +} +/* + * Theme 2 + */ +.book.color-theme-2 .book-summary { + color: #bcc1d2; + background: #2d3143; + border-right: none; +} +.book.color-theme-2 .book-summary .book-search { + background: transparent; +} +.book.color-theme-2 .book-summary .book-search input, +.book.color-theme-2 .book-summary .book-search input:focus { + border: 1px solid transparent; +} +.book.color-theme-2 .book-summary ul.summary li.divider { + background: #272a3a; + box-shadow: none; +} +.book.color-theme-2 .book-summary ul.summary li i.fa-check { + color: #33cc33; +} +.book.color-theme-2 .book-summary ul.summary li.done > a { + color: #62687f; +} +.book.color-theme-2 .book-summary ul.summary li a, +.book.color-theme-2 .book-summary ul.summary li span { + color: #c1c6d7; + background: transparent; + font-weight: 600; +} +.book.color-theme-2 .book-summary ul.summary li.active > a, +.book.color-theme-2 .book-summary ul.summary li a:hover { + color: #f4f4f5; + background: #252737; + font-weight: 600; +} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css new file mode 100644 index 000000000..2aabd3deb --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-highlight.css @@ -0,0 +1,426 @@ +.book .book-body .page-wrapper .page-inner section.normal pre, +.book .book-body .page-wrapper .page-inner section.normal code { + /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ + /* Tomorrow Comment */ + /* Tomorrow Red */ + /* Tomorrow Orange */ + /* Tomorrow Yellow */ + /* Tomorrow Green */ + /* Tomorrow Aqua */ + /* Tomorrow Blue */ + /* Tomorrow Purple */ +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-title { + color: #8e908c; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-tag, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-tag, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-id, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-class, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo { + color: #c82829; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-literal, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-literal, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-params, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-params, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-constant { + color: #f5871f; +} +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute { + color: #eab700; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-value, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-value, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance, +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-header, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + color: #718c00; +} +.book .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor, +.book .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor { + color: #3e999f; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator, +.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator, +.book .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .python .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub, +.book .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title, +.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title { + color: #4271ae; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function { + color: #8959a8; +} +.book .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + background: white; + color: #4d4d4c; + padding: 0.5em; +} +.book .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript, +.book .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript, +.book .book-body .page-wrapper .page-inner section.normal pre .javascript .xml, +.book .book-body .page-wrapper .page-inner section.normal code .javascript .xml, +.book .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .javascript, +.book .book-body .page-wrapper .page-inner section.normal code .xml .javascript, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript, +.book .book-body .page-wrapper .page-inner section.normal code .xml .vbscript, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .css, +.book .book-body .page-wrapper .page-inner section.normal code .xml .css, +.book .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + opacity: 0.5; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code { + /* + +Orginal Style from ethanschoonover.com/solarized (c) Jeremy Hull + +*/ + /* Solarized Green */ + /* Solarized Cyan */ + /* Solarized Blue */ + /* Solarized Yellow */ + /* Solarized Orange */ + /* Solarized Red */ + /* Solarized Violet */ +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + padding: 0.5em; + background: #fdf6e3; + color: #657b83; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-template_comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-template_comment, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-doctype, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-doctype, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pi, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pi, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-javadoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-javadoc { + color: #93a1a1; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-winutils, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-winutils, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .method, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .method, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-addition, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-addition, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-tag, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-tag, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-request, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-request, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-status, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-status, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .nginx .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .nginx .hljs-title { + color: #859900; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-command, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-command, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-tag .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-rules .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-rules .hljs-value, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-phpdoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-phpdoc, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-hexcolor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-hexcolor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_url, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_url { + color: #2aa198; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-localvars, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-localvars, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-chunk, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-chunk, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-decorator, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-decorator, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-identifier, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-identifier, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .vhdl .hljs-literal, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .vhdl .hljs-literal, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-id, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-id, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-function, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-function { + color: #268bd2; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .lisp .hljs-body, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .lisp .hljs-body, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .smalltalk .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .smalltalk .hljs-number, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-constant, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-class .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-class .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-parent, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-parent, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .haskell .hljs-type, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .haskell .hljs-type, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_reference, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_reference { + color: #b58900; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor .hljs-keyword, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-shebang, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-shebang, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-symbol .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-symbol .hljs-string, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .diff .hljs-change, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .diff .hljs-change, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-special, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-special, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-attr_selector, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-attr_selector, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-subst, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-subst, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-cdata, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-cdata, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .clojure .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .clojure .hljs-title, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-header { + color: #cb4b16; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-deletion, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-deletion, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-important, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-important { + color: #dc322f; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .hljs-link_label, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .hljs-link_label { + color: #6c71c4; +} +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-1 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula { + background: #eee8d5; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code { + /* Tomorrow Night Bright Theme */ + /* Original theme - https://github.com/chriskempson/tomorrow-theme */ + /* http://jmblog.github.com/color-themes-for-google-code-highlightjs */ + /* Tomorrow Comment */ + /* Tomorrow Red */ + /* Tomorrow Orange */ + /* Tomorrow Yellow */ + /* Tomorrow Green */ + /* Tomorrow Aqua */ + /* Tomorrow Blue */ + /* Tomorrow Purple */ +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-comment, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-comment, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-title { + color: #969896; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-variable, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-variable, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-tag, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-tag, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-regexp, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-regexp, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-tag .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-tag .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-pi, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-pi, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .html .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .html .hljs-doctype, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-id, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-id, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-class, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-class, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-pseudo, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-pseudo { + color: #d54e53; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-number, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-number, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-preprocessor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-preprocessor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-pragma, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-pragma, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-built_in, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-built_in, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-literal, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-literal, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-params, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-params, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-constant, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-constant { + color: #e78c45; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-class .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-class .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-rules .hljs-attribute, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-rules .hljs-attribute { + color: #e7c547; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-string, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-string, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-value, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-value, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-inheritance, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-inheritance, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-header, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-header, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-symbol, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-symbol, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + color: #b9ca4a; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .css .hljs-hexcolor, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .css .hljs-hexcolor { + color: #70c0b1; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-decorator, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-decorator, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .python .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .python .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-function .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-function .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .ruby .hljs-title .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .ruby .hljs-title .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .perl .hljs-sub, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .perl .hljs-sub, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .hljs-title, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .hljs-title { + color: #7aa6da; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs-keyword, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .hljs-function, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .hljs-function { + color: #c397d8; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .hljs, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .hljs { + display: block; + background: black; + color: #eaeaea; + padding: 0.5em; +} +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .coffeescript .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .coffeescript .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .javascript .xml, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .javascript .xml, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .tex .hljs-formula, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .tex .hljs-formula, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .javascript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .vbscript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .vbscript, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .css, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .css, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal pre .xml .hljs-cdata, +.book.color-theme-2 .book-body .page-wrapper .page-inner section.normal code .xml .hljs-cdata { + opacity: 0.5; +} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css new file mode 100644 index 000000000..d7ff2d991 --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-search.css @@ -0,0 +1,28 @@ +.book .book-summary .book-search { + padding: 6px; + background: transparent; + position: absolute; + top: -50px; + left: 0px; + right: 0px; + transition: top 0.5s ease; +} +.book .book-summary .book-search input, +.book .book-summary .book-search input:focus, +.book .book-summary .book-search input:hover { + width: 100%; + background: transparent; + border: 1px solid #ccc; + box-shadow: none; + outline: none; + line-height: 22px; + padding: 7px 4px; + color: inherit; + box-sizing: border-box; +} +.book.with-search .book-summary .book-search { + top: 0px; +} +.book.with-search .book-summary ul.summary { + top: 50px; +} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css new file mode 100644 index 000000000..7fba1b9fb --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/plugin-table.css @@ -0,0 +1 @@ +.book .book-body .page-wrapper .page-inner section.normal table{display:table;width:100%;border-collapse:collapse;border-spacing:0;overflow:auto}.book .book-body .page-wrapper .page-inner section.normal table td,.book .book-body .page-wrapper .page-inner section.normal table th{padding:6px 13px;border:1px solid #ddd}.book .book-body .page-wrapper .page-inner section.normal table tr{background-color:#fff;border-top:1px solid #ccc}.book .book-body .page-wrapper .page-inner section.normal table tr:nth-child(2n){background-color:#f8f8f8}.book .book-body .page-wrapper .page-inner section.normal table th{font-weight:700} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css new file mode 100644 index 000000000..b89689209 --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/css/style.css @@ -0,0 +1,10 @@ +/*! normalize.css v2.1.0 | MIT License | git.io/normalize */img,legend{border:0}*,.fa{-webkit-font-smoothing:antialiased}.fa-ul>li,sub,sup{position:relative}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book-langs-index .inner .languages:after,.buttons:after,.dropdown-menu .buttons:after{clear:both}body,html{-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}article,aside,details,figcaption,figure,footer,header,hgroup,main,nav,section,summary{display:block}audio,canvas,video{display:inline-block}.hidden,[hidden]{display:none}audio:not([controls]){display:none;height:0}html{font-family:sans-serif}body,figure{margin:0}a:focus{outline:dotted thin}a:active,a:hover{outline:0}h1{font-size:2em;margin:.67em 0}abbr[title]{border-bottom:1px dotted}b,strong{font-weight:700}dfn{font-style:italic}hr{-moz-box-sizing:content-box;box-sizing:content-box;height:0}mark{background:#ff0;color:#000}code,kbd,pre,samp{font-family:monospace,serif;font-size:1em}pre{white-space:pre-wrap}q{quotes:"\201C" "\201D" "\2018" "\2019"}small{font-size:80%}sub,sup{font-size:75%;line-height:0;vertical-align:baseline}sup{top:-.5em}sub{bottom:-.25em}svg:not(:root){overflow:hidden}fieldset{border:1px solid silver;margin:0 2px;padding:.35em .625em .75em}legend{padding:0}button,input,select,textarea{font-family:inherit;font-size:100%;margin:0}button,input{line-height:normal}button,select{text-transform:none}button,html input[type=button],input[type=reset],input[type=submit]{-webkit-appearance:button;cursor:pointer}button[disabled],html input[disabled]{cursor:default}input[type=checkbox],input[type=radio]{box-sizing:border-box;padding:0}input[type=search]{-webkit-appearance:textfield;-moz-box-sizing:content-box;-webkit-box-sizing:content-box;box-sizing:content-box}input[type=search]::-webkit-search-cancel-button{margin-right:10px;}button::-moz-focus-inner,input::-moz-focus-inner{border:0;padding:0}textarea{overflow:auto;vertical-align:top}table{border-collapse:collapse;border-spacing:0}/*! + * Preboot v2 + * + * Open sourced under MIT license by @mdo. + * Some variables and mixins from Bootstrap (Apache 2 license). + */.link-inherit,.link-inherit:focus,.link-inherit:hover{color:inherit}.fa,.fa-stack{display:inline-block}/*! + * Font Awesome 4.1.0 by @davegandy - http://fontawesome.io - @fontawesome + * License - http://fontawesome.io/license (Font: SIL OFL 1.1, CSS: MIT License) + */@font-face{font-family:FontAwesome;src:url(./fontawesome/fontawesome-webfont.ttf?v=4.1.0) format('truetype');font-weight:400;font-style:normal}.fa{font-family:FontAwesome;font-style:normal;font-weight:400;line-height:1;-moz-osx-font-smoothing:grayscale}.book .book-header,.book .book-summary{font-family:"Helvetica Neue",Helvetica,Arial,sans-serif}.fa-lg{font-size:1.33333333em;line-height:.75em;vertical-align:-15%}.fa-2x{font-size:2em}.fa-3x{font-size:3em}.fa-4x{font-size:4em}.fa-5x{font-size:5em}.fa-fw{width:1.28571429em;text-align:center}.fa-ul{padding-left:0;margin-left:2.14285714em;list-style-type:none}.fa-li{position:absolute;left:-2.14285714em;width:2.14285714em;top:.14285714em;text-align:center}.fa-li.fa-lg{left:-1.85714286em}.fa-border{padding:.2em .25em .15em;border:.08em solid #eee;border-radius:.1em}.pull-right{float:right}.pull-left{float:left}.fa.pull-left{margin-right:.3em}.fa.pull-right{margin-left:.3em}.fa-spin{-webkit-animation:spin 2s infinite linear;-moz-animation:spin 2s infinite linear;-o-animation:spin 2s infinite linear;animation:spin 2s infinite linear}@-moz-keyframes spin{0%{-moz-transform:rotate(0)}100%{-moz-transform:rotate(359deg)}}@-webkit-keyframes spin{0%{-webkit-transform:rotate(0)}100%{-webkit-transform:rotate(359deg)}}@-o-keyframes spin{0%{-o-transform:rotate(0)}100%{-o-transform:rotate(359deg)}}@keyframes spin{0%{-webkit-transform:rotate(0);transform:rotate(0)}100%{-webkit-transform:rotate(359deg);transform:rotate(359deg)}}.fa-rotate-90{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=1);-webkit-transform:rotate(90deg);-moz-transform:rotate(90deg);-ms-transform:rotate(90deg);-o-transform:rotate(90deg);transform:rotate(90deg)}.fa-rotate-180{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2);-webkit-transform:rotate(180deg);-moz-transform:rotate(180deg);-ms-transform:rotate(180deg);-o-transform:rotate(180deg);transform:rotate(180deg)}.fa-rotate-270{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=3);-webkit-transform:rotate(270deg);-moz-transform:rotate(270deg);-ms-transform:rotate(270deg);-o-transform:rotate(270deg);transform:rotate(270deg)}.fa-flip-horizontal{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=0, mirror=1);-webkit-transform:scale(-1,1);-moz-transform:scale(-1,1);-ms-transform:scale(-1,1);-o-transform:scale(-1,1);transform:scale(-1,1)}.fa-flip-vertical{filter:progid:DXImageTransform.Microsoft.BasicImage(rotation=2, mirror=1);-webkit-transform:scale(1,-1);-moz-transform:scale(1,-1);-ms-transform:scale(1,-1);-o-transform:scale(1,-1);transform:scale(1,-1)}.fa-stack{position:relative;width:2em;height:2em;line-height:2em;vertical-align:middle}.fa-stack-1x,.fa-stack-2x{position:absolute;left:0;width:100%;text-align:center}.fa-stack-1x{line-height:inherit}.fa-stack-2x{font-size:2em}.fa-inverse{color:#fff}.fa-glass:before{content:"\f000"}.fa-music:before{content:"\f001"}.fa-search:before{content:"\f002"}.fa-envelope-o:before{content:"\f003"}.fa-heart:before{content:"\f004"}.fa-star:before{content:"\f005"}.fa-star-o:before{content:"\f006"}.fa-user:before{content:"\f007"}.fa-film:before{content:"\f008"}.fa-th-large:before{content:"\f009"}.fa-th:before{content:"\f00a"}.fa-th-list:before{content:"\f00b"}.fa-check:before{content:"\f00c"}.fa-times:before{content:"\f00d"}.fa-search-plus:before{content:"\f00e"}.fa-search-minus:before{content:"\f010"}.fa-power-off:before{content:"\f011"}.fa-signal:before{content:"\f012"}.fa-cog:before,.fa-gear:before{content:"\f013"}.fa-trash-o:before{content:"\f014"}.fa-home:before{content:"\f015"}.fa-file-o:before{content:"\f016"}.fa-clock-o:before{content:"\f017"}.fa-road:before{content:"\f018"}.fa-download:before{content:"\f019"}.fa-arrow-circle-o-down:before{content:"\f01a"}.fa-arrow-circle-o-up:before{content:"\f01b"}.fa-inbox:before{content:"\f01c"}.fa-play-circle-o:before{content:"\f01d"}.fa-repeat:before,.fa-rotate-right:before{content:"\f01e"}.fa-refresh:before{content:"\f021"}.fa-list-alt:before{content:"\f022"}.fa-lock:before{content:"\f023"}.fa-flag:before{content:"\f024"}.fa-headphones:before{content:"\f025"}.fa-volume-off:before{content:"\f026"}.fa-volume-down:before{content:"\f027"}.fa-volume-up:before{content:"\f028"}.fa-qrcode:before{content:"\f029"}.fa-barcode:before{content:"\f02a"}.fa-tag:before{content:"\f02b"}.fa-tags:before{content:"\f02c"}.fa-book:before{content:"\f02d"}.fa-bookmark:before{content:"\f02e"}.fa-print:before{content:"\f02f"}.fa-camera:before{content:"\f030"}.fa-font:before{content:"\f031"}.fa-bold:before{content:"\f032"}.fa-italic:before{content:"\f033"}.fa-text-height:before{content:"\f034"}.fa-text-width:before{content:"\f035"}.fa-align-left:before{content:"\f036"}.fa-align-center:before{content:"\f037"}.fa-align-right:before{content:"\f038"}.fa-align-justify:before{content:"\f039"}.fa-list:before{content:"\f03a"}.fa-dedent:before,.fa-outdent:before{content:"\f03b"}.fa-indent:before{content:"\f03c"}.fa-video-camera:before{content:"\f03d"}.fa-image:before,.fa-photo:before,.fa-picture-o:before{content:"\f03e"}.fa-pencil:before{content:"\f040"}.fa-map-marker:before{content:"\f041"}.fa-adjust:before{content:"\f042"}.fa-tint:before{content:"\f043"}.fa-edit:before,.fa-pencil-square-o:before{content:"\f044"}.fa-share-square-o:before{content:"\f045"}.fa-check-square-o:before{content:"\f046"}.fa-arrows:before{content:"\f047"}.fa-step-backward:before{content:"\f048"}.fa-fast-backward:before{content:"\f049"}.fa-backward:before{content:"\f04a"}.fa-play:before{content:"\f04b"}.fa-pause:before{content:"\f04c"}.fa-stop:before{content:"\f04d"}.fa-forward:before{content:"\f04e"}.fa-fast-forward:before{content:"\f050"}.fa-step-forward:before{content:"\f051"}.fa-eject:before{content:"\f052"}.fa-chevron-left:before{content:"\f053"}.fa-chevron-right:before{content:"\f054"}.fa-plus-circle:before{content:"\f055"}.fa-minus-circle:before{content:"\f056"}.fa-times-circle:before{content:"\f057"}.fa-check-circle:before{content:"\f058"}.fa-question-circle:before{content:"\f059"}.fa-info-circle:before{content:"\f05a"}.fa-crosshairs:before{content:"\f05b"}.fa-times-circle-o:before{content:"\f05c"}.fa-check-circle-o:before{content:"\f05d"}.fa-ban:before{content:"\f05e"}.fa-arrow-left:before{content:"\f060"}.fa-arrow-right:before{content:"\f061"}.fa-arrow-up:before{content:"\f062"}.fa-arrow-down:before{content:"\f063"}.fa-mail-forward:before,.fa-share:before{content:"\f064"}.fa-expand:before{content:"\f065"}.fa-compress:before{content:"\f066"}.fa-plus:before{content:"\f067"}.fa-minus:before{content:"\f068"}.fa-asterisk:before{content:"\f069"}.fa-exclamation-circle:before{content:"\f06a"}.fa-gift:before{content:"\f06b"}.fa-leaf:before{content:"\f06c"}.fa-fire:before{content:"\f06d"}.fa-eye:before{content:"\f06e"}.fa-eye-slash:before{content:"\f070"}.fa-exclamation-triangle:before,.fa-warning:before{content:"\f071"}.fa-plane:before{content:"\f072"}.fa-calendar:before{content:"\f073"}.fa-random:before{content:"\f074"}.fa-comment:before{content:"\f075"}.fa-magnet:before{content:"\f076"}.fa-chevron-up:before{content:"\f077"}.fa-chevron-down:before{content:"\f078"}.fa-retweet:before{content:"\f079"}.fa-shopping-cart:before{content:"\f07a"}.fa-folder:before{content:"\f07b"}.fa-folder-open:before{content:"\f07c"}.fa-arrows-v:before{content:"\f07d"}.fa-arrows-h:before{content:"\f07e"}.fa-bar-chart-o:before{content:"\f080"}.fa-twitter-square:before{content:"\f081"}.fa-facebook-square:before{content:"\f082"}.fa-camera-retro:before{content:"\f083"}.fa-key:before{content:"\f084"}.fa-cogs:before,.fa-gears:before{content:"\f085"}.fa-comments:before{content:"\f086"}.fa-thumbs-o-up:before{content:"\f087"}.fa-thumbs-o-down:before{content:"\f088"}.fa-star-half:before{content:"\f089"}.fa-heart-o:before{content:"\f08a"}.fa-sign-out:before{content:"\f08b"}.fa-linkedin-square:before{content:"\f08c"}.fa-thumb-tack:before{content:"\f08d"}.fa-external-link:before{content:"\f08e"}.fa-sign-in:before{content:"\f090"}.fa-trophy:before{content:"\f091"}.fa-github-square:before{content:"\f092"}.fa-upload:before{content:"\f093"}.fa-lemon-o:before{content:"\f094"}.fa-phone:before{content:"\f095"}.fa-square-o:before{content:"\f096"}.fa-bookmark-o:before{content:"\f097"}.fa-phone-square:before{content:"\f098"}.fa-twitter:before{content:"\f099"}.fa-facebook:before{content:"\f09a"}.fa-github:before{content:"\f09b"}.fa-unlock:before{content:"\f09c"}.fa-credit-card:before{content:"\f09d"}.fa-rss:before{content:"\f09e"}.fa-hdd-o:before{content:"\f0a0"}.fa-bullhorn:before{content:"\f0a1"}.fa-bell:before{content:"\f0f3"}.fa-certificate:before{content:"\f0a3"}.fa-hand-o-right:before{content:"\f0a4"}.fa-hand-o-left:before{content:"\f0a5"}.fa-hand-o-up:before{content:"\f0a6"}.fa-hand-o-down:before{content:"\f0a7"}.fa-arrow-circle-left:before{content:"\f0a8"}.fa-arrow-circle-right:before{content:"\f0a9"}.fa-arrow-circle-up:before{content:"\f0aa"}.fa-arrow-circle-down:before{content:"\f0ab"}.fa-globe:before{content:"\f0ac"}.fa-wrench:before{content:"\f0ad"}.fa-tasks:before{content:"\f0ae"}.fa-filter:before{content:"\f0b0"}.fa-briefcase:before{content:"\f0b1"}.fa-arrows-alt:before{content:"\f0b2"}.fa-group:before,.fa-users:before{content:"\f0c0"}.fa-chain:before,.fa-link:before{content:"\f0c1"}.fa-cloud:before{content:"\f0c2"}.fa-flask:before{content:"\f0c3"}.fa-cut:before,.fa-scissors:before{content:"\f0c4"}.fa-copy:before,.fa-files-o:before{content:"\f0c5"}.fa-paperclip:before{content:"\f0c6"}.fa-floppy-o:before,.fa-save:before{content:"\f0c7"}.fa-square:before{content:"\f0c8"}.fa-bars:before,.fa-navicon:before,.fa-reorder:before{content:"\f0c9"}.fa-list-ul:before{content:"\f0ca"}.fa-list-ol:before{content:"\f0cb"}.fa-strikethrough:before{content:"\f0cc"}.fa-underline:before{content:"\f0cd"}.fa-table:before{content:"\f0ce"}.fa-magic:before{content:"\f0d0"}.fa-truck:before{content:"\f0d1"}.fa-pinterest:before{content:"\f0d2"}.fa-pinterest-square:before{content:"\f0d3"}.fa-google-plus-square:before{content:"\f0d4"}.fa-google-plus:before{content:"\f0d5"}.fa-money:before{content:"\f0d6"}.fa-caret-down:before{content:"\f0d7"}.fa-caret-up:before{content:"\f0d8"}.fa-caret-left:before{content:"\f0d9"}.fa-caret-right:before{content:"\f0da"}.fa-columns:before{content:"\f0db"}.fa-sort:before,.fa-unsorted:before{content:"\f0dc"}.fa-sort-desc:before,.fa-sort-down:before{content:"\f0dd"}.fa-sort-asc:before,.fa-sort-up:before{content:"\f0de"}.fa-envelope:before{content:"\f0e0"}.fa-linkedin:before{content:"\f0e1"}.fa-rotate-left:before,.fa-undo:before{content:"\f0e2"}.fa-gavel:before,.fa-legal:before{content:"\f0e3"}.fa-dashboard:before,.fa-tachometer:before{content:"\f0e4"}.fa-comment-o:before{content:"\f0e5"}.fa-comments-o:before{content:"\f0e6"}.fa-bolt:before,.fa-flash:before{content:"\f0e7"}.fa-sitemap:before{content:"\f0e8"}.fa-umbrella:before{content:"\f0e9"}.fa-clipboard:before,.fa-paste:before{content:"\f0ea"}.fa-lightbulb-o:before{content:"\f0eb"}.fa-exchange:before{content:"\f0ec"}.fa-cloud-download:before{content:"\f0ed"}.fa-cloud-upload:before{content:"\f0ee"}.fa-user-md:before{content:"\f0f0"}.fa-stethoscope:before{content:"\f0f1"}.fa-suitcase:before{content:"\f0f2"}.fa-bell-o:before{content:"\f0a2"}.fa-coffee:before{content:"\f0f4"}.fa-cutlery:before{content:"\f0f5"}.fa-file-text-o:before{content:"\f0f6"}.fa-building-o:before{content:"\f0f7"}.fa-hospital-o:before{content:"\f0f8"}.fa-ambulance:before{content:"\f0f9"}.fa-medkit:before{content:"\f0fa"}.fa-fighter-jet:before{content:"\f0fb"}.fa-beer:before{content:"\f0fc"}.fa-h-square:before{content:"\f0fd"}.fa-plus-square:before{content:"\f0fe"}.fa-angle-double-left:before{content:"\f100"}.fa-angle-double-right:before{content:"\f101"}.fa-angle-double-up:before{content:"\f102"}.fa-angle-double-down:before{content:"\f103"}.fa-angle-left:before{content:"\f104"}.fa-angle-right:before{content:"\f105"}.fa-angle-up:before{content:"\f106"}.fa-angle-down:before{content:"\f107"}.fa-desktop:before{content:"\f108"}.fa-laptop:before{content:"\f109"}.fa-tablet:before{content:"\f10a"}.fa-mobile-phone:before,.fa-mobile:before{content:"\f10b"}.fa-circle-o:before{content:"\f10c"}.fa-quote-left:before{content:"\f10d"}.fa-quote-right:before{content:"\f10e"}.fa-spinner:before{content:"\f110"}.fa-circle:before{content:"\f111"}.fa-mail-reply:before,.fa-reply:before{content:"\f112"}.fa-github-alt:before{content:"\f113"}.fa-folder-o:before{content:"\f114"}.fa-folder-open-o:before{content:"\f115"}.fa-smile-o:before{content:"\f118"}.fa-frown-o:before{content:"\f119"}.fa-meh-o:before{content:"\f11a"}.fa-gamepad:before{content:"\f11b"}.fa-keyboard-o:before{content:"\f11c"}.fa-flag-o:before{content:"\f11d"}.fa-flag-checkered:before{content:"\f11e"}.fa-terminal:before{content:"\f120"}.fa-code:before{content:"\f121"}.fa-mail-reply-all:before,.fa-reply-all:before{content:"\f122"}.fa-star-half-empty:before,.fa-star-half-full:before,.fa-star-half-o:before{content:"\f123"}.fa-location-arrow:before{content:"\f124"}.fa-crop:before{content:"\f125"}.fa-code-fork:before{content:"\f126"}.fa-chain-broken:before,.fa-unlink:before{content:"\f127"}.fa-question:before{content:"\f128"}.fa-info:before{content:"\f129"}.fa-exclamation:before{content:"\f12a"}.fa-superscript:before{content:"\f12b"}.fa-subscript:before{content:"\f12c"}.fa-eraser:before{content:"\f12d"}.fa-puzzle-piece:before{content:"\f12e"}.fa-microphone:before{content:"\f130"}.fa-microphone-slash:before{content:"\f131"}.fa-shield:before{content:"\f132"}.fa-calendar-o:before{content:"\f133"}.fa-fire-extinguisher:before{content:"\f134"}.fa-rocket:before{content:"\f135"}.fa-maxcdn:before{content:"\f136"}.fa-chevron-circle-left:before{content:"\f137"}.fa-chevron-circle-right:before{content:"\f138"}.fa-chevron-circle-up:before{content:"\f139"}.fa-chevron-circle-down:before{content:"\f13a"}.fa-html5:before{content:"\f13b"}.fa-css3:before{content:"\f13c"}.fa-anchor:before{content:"\f13d"}.fa-unlock-alt:before{content:"\f13e"}.fa-bullseye:before{content:"\f140"}.fa-ellipsis-h:before{content:"\f141"}.fa-ellipsis-v:before{content:"\f142"}.fa-rss-square:before{content:"\f143"}.fa-play-circle:before{content:"\f144"}.fa-ticket:before{content:"\f145"}.fa-minus-square:before{content:"\f146"}.fa-minus-square-o:before{content:"\f147"}.fa-level-up:before{content:"\f148"}.fa-level-down:before{content:"\f149"}.fa-check-square:before{content:"\f14a"}.fa-pencil-square:before{content:"\f14b"}.fa-external-link-square:before{content:"\f14c"}.fa-share-square:before{content:"\f14d"}.fa-compass:before{content:"\f14e"}.fa-caret-square-o-down:before,.fa-toggle-down:before{content:"\f150"}.fa-caret-square-o-up:before,.fa-toggle-up:before{content:"\f151"}.fa-caret-square-o-right:before,.fa-toggle-right:before{content:"\f152"}.fa-eur:before,.fa-euro:before{content:"\f153"}.fa-gbp:before{content:"\f154"}.fa-dollar:before,.fa-usd:before{content:"\f155"}.fa-inr:before,.fa-rupee:before{content:"\f156"}.fa-cny:before,.fa-jpy:before,.fa-rmb:before,.fa-yen:before{content:"\f157"}.fa-rouble:before,.fa-rub:before,.fa-ruble:before{content:"\f158"}.fa-krw:before,.fa-won:before{content:"\f159"}.fa-bitcoin:before,.fa-btc:before{content:"\f15a"}.fa-file:before{content:"\f15b"}.fa-file-text:before{content:"\f15c"}.fa-sort-alpha-asc:before{content:"\f15d"}.fa-sort-alpha-desc:before{content:"\f15e"}.fa-sort-amount-asc:before{content:"\f160"}.fa-sort-amount-desc:before{content:"\f161"}.fa-sort-numeric-asc:before{content:"\f162"}.fa-sort-numeric-desc:before{content:"\f163"}.fa-thumbs-up:before{content:"\f164"}.fa-thumbs-down:before{content:"\f165"}.fa-youtube-square:before{content:"\f166"}.fa-youtube:before{content:"\f167"}.fa-xing:before{content:"\f168"}.fa-xing-square:before{content:"\f169"}.fa-youtube-play:before{content:"\f16a"}.fa-dropbox:before{content:"\f16b"}.fa-stack-overflow:before{content:"\f16c"}.fa-instagram:before{content:"\f16d"}.fa-flickr:before{content:"\f16e"}.fa-adn:before{content:"\f170"}.fa-bitbucket:before{content:"\f171"}.fa-bitbucket-square:before{content:"\f172"}.fa-tumblr:before{content:"\f173"}.fa-tumblr-square:before{content:"\f174"}.fa-long-arrow-down:before{content:"\f175"}.fa-long-arrow-up:before{content:"\f176"}.fa-long-arrow-left:before{content:"\f177"}.fa-long-arrow-right:before{content:"\f178"}.fa-apple:before{content:"\f179"}.fa-windows:before{content:"\f17a"}.fa-android:before{content:"\f17b"}.fa-linux:before{content:"\f17c"}.fa-dribbble:before{content:"\f17d"}.fa-skype:before{content:"\f17e"}.fa-foursquare:before{content:"\f180"}.fa-trello:before{content:"\f181"}.fa-female:before{content:"\f182"}.fa-male:before{content:"\f183"}.fa-gittip:before{content:"\f184"}.fa-sun-o:before{content:"\f185"}.fa-moon-o:before{content:"\f186"}.fa-archive:before{content:"\f187"}.fa-bug:before{content:"\f188"}.fa-vk:before{content:"\f189"}.fa-weibo:before{content:"\f18a"}.fa-renren:before{content:"\f18b"}.fa-pagelines:before{content:"\f18c"}.fa-stack-exchange:before{content:"\f18d"}.fa-arrow-circle-o-right:before{content:"\f18e"}.fa-arrow-circle-o-left:before{content:"\f190"}.fa-caret-square-o-left:before,.fa-toggle-left:before{content:"\f191"}.fa-dot-circle-o:before{content:"\f192"}.fa-wheelchair:before{content:"\f193"}.fa-vimeo-square:before{content:"\f194"}.fa-try:before,.fa-turkish-lira:before{content:"\f195"}.fa-plus-square-o:before{content:"\f196"}.fa-space-shuttle:before{content:"\f197"}.fa-slack:before{content:"\f198"}.fa-envelope-square:before{content:"\f199"}.fa-wordpress:before{content:"\f19a"}.fa-openid:before{content:"\f19b"}.fa-bank:before,.fa-institution:before,.fa-university:before{content:"\f19c"}.fa-graduation-cap:before,.fa-mortar-board:before{content:"\f19d"}.fa-yahoo:before{content:"\f19e"}.fa-google:before{content:"\f1a0"}.fa-reddit:before{content:"\f1a1"}.fa-reddit-square:before{content:"\f1a2"}.fa-stumbleupon-circle:before{content:"\f1a3"}.fa-stumbleupon:before{content:"\f1a4"}.fa-delicious:before{content:"\f1a5"}.fa-digg:before{content:"\f1a6"}.fa-pied-piper-square:before,.fa-pied-piper:before{content:"\f1a7"}.fa-pied-piper-alt:before{content:"\f1a8"}.fa-drupal:before{content:"\f1a9"}.fa-joomla:before{content:"\f1aa"}.fa-language:before{content:"\f1ab"}.fa-fax:before{content:"\f1ac"}.fa-building:before{content:"\f1ad"}.fa-child:before{content:"\f1ae"}.fa-paw:before{content:"\f1b0"}.fa-spoon:before{content:"\f1b1"}.fa-cube:before{content:"\f1b2"}.fa-cubes:before{content:"\f1b3"}.fa-behance:before{content:"\f1b4"}.fa-behance-square:before{content:"\f1b5"}.fa-steam:before{content:"\f1b6"}.fa-steam-square:before{content:"\f1b7"}.fa-recycle:before{content:"\f1b8"}.fa-automobile:before,.fa-car:before{content:"\f1b9"}.fa-cab:before,.fa-taxi:before{content:"\f1ba"}.fa-tree:before{content:"\f1bb"}.fa-spotify:before{content:"\f1bc"}.fa-deviantart:before{content:"\f1bd"}.fa-soundcloud:before{content:"\f1be"}.fa-database:before{content:"\f1c0"}.fa-file-pdf-o:before{content:"\f1c1"}.fa-file-word-o:before{content:"\f1c2"}.fa-file-excel-o:before{content:"\f1c3"}.fa-file-powerpoint-o:before{content:"\f1c4"}.fa-file-image-o:before,.fa-file-photo-o:before,.fa-file-picture-o:before{content:"\f1c5"}.fa-file-archive-o:before,.fa-file-zip-o:before{content:"\f1c6"}.fa-file-audio-o:before,.fa-file-sound-o:before{content:"\f1c7"}.fa-file-movie-o:before,.fa-file-video-o:before{content:"\f1c8"}.fa-file-code-o:before{content:"\f1c9"}.fa-vine:before{content:"\f1ca"}.fa-codepen:before{content:"\f1cb"}.fa-jsfiddle:before{content:"\f1cc"}.fa-life-bouy:before,.fa-life-ring:before,.fa-life-saver:before,.fa-support:before{content:"\f1cd"}.fa-circle-o-notch:before{content:"\f1ce"}.fa-ra:before,.fa-rebel:before{content:"\f1d0"}.fa-empire:before,.fa-ge:before{content:"\f1d1"}.fa-git-square:before{content:"\f1d2"}.fa-git:before{content:"\f1d3"}.fa-hacker-news:before{content:"\f1d4"}.fa-tencent-weibo:before{content:"\f1d5"}.fa-qq:before{content:"\f1d6"}.fa-wechat:before,.fa-weixin:before{content:"\f1d7"}.fa-paper-plane:before,.fa-send:before{content:"\f1d8"}.fa-paper-plane-o:before,.fa-send-o:before{content:"\f1d9"}.fa-history:before{content:"\f1da"}.fa-circle-thin:before{content:"\f1db"}.fa-header:before{content:"\f1dc"}.fa-paragraph:before{content:"\f1dd"}.fa-sliders:before{content:"\f1de"}.fa-share-alt:before{content:"\f1e0"}.fa-share-alt-square:before{content:"\f1e1"}.fa-bomb:before{content:"\f1e2"}.book-langs-index{width:100%;height:100%;padding:40px 0;margin:0;overflow:auto}@media (max-width:600px){.book-langs-index{padding:0}}.book-langs-index .inner{max-width:600px;width:100%;margin:0 auto;padding:30px;background:#fff;border-radius:3px}.book-langs-index .inner h3{margin:0}.book-langs-index .inner .languages{list-style:none;padding:20px 30px;margin-top:20px;border-top:1px solid #eee}.book-langs-index .inner .languages:after,.book-langs-index .inner .languages:before{content:" ";display:table;line-height:0}.book-langs-index .inner .languages li{width:50%;float:left;padding:10px 5px;font-size:16px}@media (max-width:600px){.book-langs-index .inner .languages li{width:100%;max-width:100%}}.book .book-header{overflow:visible;height:50px;padding:0 8px;z-index:2;font-size:.85em;color:#7e888b;background:0 0}.book .book-header .btn{display:block;height:50px;padding:0 15px;border-bottom:none;color:#ccc;text-transform:uppercase;line-height:50px;-webkit-box-shadow:none!important;box-shadow:none!important;position:relative;font-size:14px}.book .book-header .btn:hover{position:relative;text-decoration:none;color:#444;background:0 0}.book .book-header h1{margin:0;font-size:20px;font-weight:200;text-align:center;line-height:50px;opacity:0;padding-left:200px;padding-right:200px;-webkit-transition:opacity .2s ease;-moz-transition:opacity .2s ease;-o-transition:opacity .2s ease;transition:opacity .2s ease;overflow:hidden;text-overflow:ellipsis;white-space:nowrap}.book .book-header h1 a,.book .book-header h1 a:hover{color:inherit;text-decoration:none}@media (max-width:1000px){.book .book-header h1{display:none}}.book .book-header h1 i{display:none}.book .book-header:hover h1{opacity:1}.book.is-loading .book-header h1 i{display:inline-block}.book.is-loading .book-header h1 a{display:none}.dropdown{position:relative}.dropdown-menu{position:absolute;top:100%;left:0;z-index:100;display:none;float:left;min-width:160px;padding:0;margin:2px 0 0;list-style:none;font-size:14px;background-color:#fafafa;border:1px solid rgba(0,0,0,.07);border-radius:1px;-webkit-box-shadow:0 6px 12px rgba(0,0,0,.175);box-shadow:0 6px 12px rgba(0,0,0,.175);background-clip:padding-box}.dropdown-menu.open{display:block}.dropdown-menu.dropdown-left{left:auto;right:4%}.dropdown-menu.dropdown-left .dropdown-caret{right:14px;left:auto}.dropdown-menu .dropdown-caret{position:absolute;top:-8px;left:14px;width:18px;height:10px;float:left;overflow:hidden}.dropdown-menu .dropdown-caret .caret-inner,.dropdown-menu .dropdown-caret .caret-outer{display:inline-block;top:0;border-left:9px solid transparent;border-right:9px solid transparent;position:absolute}.dropdown-menu .dropdown-caret .caret-outer{border-bottom:9px solid rgba(0,0,0,.1);height:auto;left:0;width:auto;margin-left:-1px}.dropdown-menu .dropdown-caret .caret-inner{margin-top:-1px;top:1px;border-bottom:9px solid #fafafa}.dropdown-menu .buttons{border-bottom:1px solid rgba(0,0,0,.07)}.dropdown-menu .buttons:after,.dropdown-menu .buttons:before{content:" ";display:table;line-height:0}.dropdown-menu .buttons:last-child{border-bottom:none}.dropdown-menu .buttons .button{border:0;background-color:transparent;color:#a6a6a6;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.alert,.dropdown-menu .buttons .button:hover{color:#444}.dropdown-menu .buttons .button:focus,.dropdown-menu .buttons .button:hover{outline:0}.dropdown-menu .buttons .button.size-2{width:50%}.dropdown-menu .buttons .button.size-3{width:33%}.alert{padding:15px;margin-bottom:20px;background:#eee;border-bottom:5px solid #ddd}.alert-success{background:#dff0d8;border-color:#d6e9c6;color:#3c763d}.alert-info{background:#d9edf7;border-color:#bce8f1;color:#31708f}.alert-danger{background:#f2dede;border-color:#ebccd1;color:#a94442}.alert-warning{background:#fcf8e3;border-color:#faebcc;color:#8a6d3b}.book .book-summary{position:absolute;top:0;left:-300px;bottom:0;z-index:1;width:300px;color:#364149;background:#fafafa;border-right:1px solid rgba(0,0,0,.07);-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-summary ul.summary{position:absolute;top:0;left:0;right:0;bottom:0;overflow-y:auto;list-style:none;margin:0;padding:0;-webkit-transition:top .5s ease;-moz-transition:top .5s ease;-o-transition:top .5s ease;transition:top .5s ease}.book .book-summary ul.summary li{list-style:none}.book .book-summary ul.summary li.divider{height:1px;margin:7px 0;overflow:hidden;background:rgba(0,0,0,.07)}.book .book-summary ul.summary li i.fa-check{display:none;position:absolute;right:9px;top:16px;font-size:9px;color:#3c3}.book .book-summary ul.summary li.done>a{color:#364149;font-weight:400}.book .book-summary ul.summary li.done>a i{display:inline}.book .book-summary ul.summary li a,.book .book-summary ul.summary li span{display:block;padding:10px 15px;border-bottom:none;color:#364149;background:0 0;text-overflow:ellipsis;overflow:hidden;white-space:nowrap;position:relative}.book .book-summary ul.summary li span{cursor:not-allowed;opacity:.3;filter:alpha(opacity=30)}.book .book-summary ul.summary li a:hover,.book .book-summary ul.summary li.active>a{color:#008cff;background:0 0;text-decoration:none}.book .book-summary ul.summary li ul{padding-left:20px}@media (max-width:600px){.book .book-summary{width:calc(100% - 60px);bottom:0;left:-100%}}.book.with-summary .book-summary{left:0}.book.without-animation .book-summary{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.book{position:relative;width:100%;height:100%}.book .book-body,.book .book-body .body-inner{position:absolute;top:0;left:0;overflow-y:auto;bottom:0;right:0}.book .book-body{color:#000;background:#fff;-webkit-transition:left 250ms ease;-moz-transition:left 250ms ease;-o-transition:left 250ms ease;transition:left 250ms ease}.book .book-body .page-wrapper{position:relative;outline:0}.book .book-body .page-wrapper .page-inner{max-width:800px;margin:0 auto;padding:20px 0 40px}.book .book-body .page-wrapper .page-inner section{margin:0;padding:5px 15px;background:#fff;border-radius:2px;line-height:1.7;font-size:1.6rem}.book .book-body .page-wrapper .page-inner .btn-group .btn{border-radius:0;background:#eee;border:0}@media (max-width:1240px){.book .book-body{-webkit-transition:-webkit-transform 250ms ease;-moz-transition:-moz-transform 250ms ease;-o-transition:-o-transform 250ms ease;transition:transform 250ms ease;padding-bottom:20px}.book .book-body .body-inner{position:static;min-height:calc(100% - 50px)}}@media (min-width:600px){.book.with-summary .book-body{left:300px}}@media (max-width:600px){.book.with-summary{overflow:hidden}.book.with-summary .book-body{-webkit-transform:translate(calc(100% - 60px),0);-moz-transform:translate(calc(100% - 60px),0);-ms-transform:translate(calc(100% - 60px),0);-o-transform:translate(calc(100% - 60px),0);transform:translate(calc(100% - 60px),0)}}.book.without-animation .book-body{-webkit-transition:none!important;-moz-transition:none!important;-o-transition:none!important;transition:none!important}.buttons:after,.buttons:before{content:" ";display:table;line-height:0}.button{border:0;background:#eee;color:#666;width:100%;text-align:center;float:left;line-height:1.42857143;padding:8px 4px}.button:hover{color:#444}.button:focus,.button:hover{outline:0}.button.size-2{width:50%}.button.size-3{width:33%}.book .book-body .page-wrapper .page-inner section{display:none}.book .book-body .page-wrapper .page-inner section.normal{display:block;word-wrap:break-word;overflow:hidden;color:#333;line-height:1.7;text-size-adjust:100%;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%;-moz-text-size-adjust:100%}.book .book-body .page-wrapper .page-inner section.normal *{box-sizing:border-box;-webkit-box-sizing:border-box;}.book .book-body .page-wrapper .page-inner section.normal>:first-child{margin-top:0!important}.book .book-body .page-wrapper .page-inner section.normal>:last-child{margin-bottom:0!important}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal figure,.book .book-body .page-wrapper .page-inner section.normal img,.book .book-body .page-wrapper .page-inner section.normal pre,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal tr{page-break-inside:avoid}.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal p{orphans:3;widows:3}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5{page-break-after:avoid}.book .book-body .page-wrapper .page-inner section.normal b,.book .book-body .page-wrapper .page-inner section.normal strong{font-weight:700}.book .book-body .page-wrapper .page-inner section.normal em{font-style:italic}.book .book-body .page-wrapper .page-inner section.normal blockquote,.book .book-body .page-wrapper .page-inner section.normal dl,.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal p,.book .book-body .page-wrapper .page-inner section.normal table,.book .book-body .page-wrapper .page-inner section.normal ul{margin-top:0;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal a{color:#4183c4;text-decoration:none;background:0 0}.book .book-body .page-wrapper .page-inner section.normal a:active,.book .book-body .page-wrapper .page-inner section.normal a:focus,.book .book-body .page-wrapper .page-inner section.normal a:hover{outline:0;text-decoration:underline}.book .book-body .page-wrapper .page-inner section.normal img{border:0;max-width:100%}.book .book-body .page-wrapper .page-inner section.normal hr{height:4px;padding:0;margin:1.7em 0;overflow:hidden;background-color:#e7e7e7;border:none}.book .book-body .page-wrapper .page-inner section.normal hr:after,.book .book-body .page-wrapper .page-inner section.normal hr:before{display:table;content:" "}.book .book-body .page-wrapper .page-inner section.normal h1,.book .book-body .page-wrapper .page-inner section.normal h2,.book .book-body .page-wrapper .page-inner section.normal h3,.book .book-body .page-wrapper .page-inner section.normal h4,.book .book-body .page-wrapper .page-inner section.normal h5,.book .book-body .page-wrapper .page-inner section.normal h6{margin-top:1.275em;margin-bottom:.85em;}.book .book-body .page-wrapper .page-inner section.normal h1{font-size:2em}.book .book-body .page-wrapper .page-inner section.normal h2{font-size:1.75em}.book .book-body .page-wrapper .page-inner section.normal h3{font-size:1.5em}.book .book-body .page-wrapper .page-inner section.normal h4{font-size:1.25em}.book .book-body .page-wrapper .page-inner section.normal h5{font-size:1em}.book .book-body .page-wrapper .page-inner section.normal h6{font-size:1em;color:#777}.book .book-body .page-wrapper .page-inner section.normal code,.book .book-body .page-wrapper .page-inner section.normal pre{font-family:Consolas,"Liberation Mono",Menlo,Courier,monospace;direction:ltr;border:none;color:inherit}.book .book-body .page-wrapper .page-inner section.normal pre{overflow:auto;word-wrap:normal;margin:0 0 1.275em;padding:.85em 1em;background:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal pre>code{display:inline;max-width:initial;padding:0;margin:0;overflow:initial;line-height:inherit;font-size:.85em;white-space:pre;background:0 0}.book .book-body .page-wrapper .page-inner section.normal pre>code:after,.book .book-body .page-wrapper .page-inner section.normal pre>code:before{content:normal}.book .book-body .page-wrapper .page-inner section.normal code{padding:.2em;margin:0;font-size:.85em;background-color:#f7f7f7}.book .book-body .page-wrapper .page-inner section.normal code:after,.book .book-body .page-wrapper .page-inner section.normal code:before{letter-spacing:-.2em;content:"\00a0"}.book .book-body .page-wrapper .page-inner section.normal ol,.book .book-body .page-wrapper .page-inner section.normal ul{padding:0 0 0 2em;margin:0 0 .85em}.book .book-body .page-wrapper .page-inner section.normal ol ol,.book .book-body .page-wrapper .page-inner section.normal ol ul,.book .book-body .page-wrapper .page-inner section.normal ul ol,.book .book-body .page-wrapper .page-inner section.normal ul ul{margin-top:0;margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal ol ol{list-style-type:lower-roman}.book .book-body .page-wrapper .page-inner section.normal blockquote{margin:0 0 .85em;padding:0 15px;opacity:0.75;border-left:4px solid #dcdcdc}.book .book-body .page-wrapper .page-inner section.normal blockquote:first-child{margin-top:0}.book .book-body .page-wrapper .page-inner section.normal blockquote:last-child{margin-bottom:0}.book .book-body .page-wrapper .page-inner section.normal dl{padding:0}.book .book-body .page-wrapper .page-inner section.normal dl dt{padding:0;margin-top:.85em;font-style:italic;font-weight:700}.book .book-body .page-wrapper .page-inner section.normal dl dd{padding:0 .85em;margin-bottom:.85em}.book .book-body .page-wrapper .page-inner section.normal dd{margin-left:0}.book .book-body .page-wrapper .page-inner section.normal .glossary-term{cursor:help;text-decoration:underline}.book .book-body .navigation{position:absolute;top:50px;bottom:0;margin:0;max-width:150px;min-width:90px;display:flex;justify-content:center;align-content:center;flex-direction:column;font-size:40px;color:#ccc;text-align:center;-webkit-transition:all 350ms ease;-moz-transition:all 350ms ease;-o-transition:all 350ms ease;transition:all 350ms ease}.book .book-body .navigation:hover{text-decoration:none;color:#444}.book .book-body .navigation.navigation-next{right:0}.book .book-body .navigation.navigation-prev{left:0}@media (max-width:1240px){.book .book-body .navigation{position:static;top:auto;max-width:50%;width:50%;display:inline-block;float:left}.book .book-body .navigation.navigation-unique{max-width:100%;width:100%}}.book .book-body .page-wrapper .page-inner section.glossary{margin-bottom:40px}.book .book-body .page-wrapper .page-inner section.glossary h2 a,.book .book-body .page-wrapper .page-inner section.glossary h2 a:hover{color:inherit;text-decoration:none}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index{list-style:none;margin:0;padding:0}.book .book-body .page-wrapper .page-inner section.glossary .glossary-index li{display:inline;margin:0 8px;white-space:nowrap}*{-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;-webkit-overflow-scrolling:touch;-webkit-tap-highlight-color:transparent;-webkit-text-size-adjust:none;-webkit-touch-callout:none}a{text-decoration:none}body,html{height:100%}html{font-size:62.5%}body{text-rendering:optimizeLegibility;font-smoothing:antialiased;font-family:"Helvetica Neue",Helvetica,Arial,sans-serif;font-size:14px;letter-spacing:.2px;text-size-adjust:100%} +.book .book-summary ul.summary li a span {display:inline;padding:initial;overflow:visible;cursor:auto;opacity:1;} diff --git a/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js new file mode 100644 index 000000000..9ace197e9 --- /dev/null +++ b/previous_versions/v0.4.0/libs/gitbook-2.6.7/js/app.min.js @@ -0,0 +1,6 @@ +(function e(t,n,r){function s(o,u){if(!n[o]){if(!t[o]){var a=typeof require=="function"&&require;if(!u&&a)return a(o,!0);if(i)return i(o,!0);var f=new Error("Cannot find module '"+o+"'");throw f.code="MODULE_NOT_FOUND",f}var l=n[o]={exports:{}};t[o][0].call(l.exports,function(e){var n=t[o][1][e];return s(n?n:e)},l,l.exports,e,t,n,r)}return n[o].exports}var i=typeof require=="function"&&require;for(var o=0;o"'`]/g,reHasEscapedHtml=RegExp(reEscapedHtml.source),reHasUnescapedHtml=RegExp(reUnescapedHtml.source);var reEscape=/<%-([\s\S]+?)%>/g,reEvaluate=/<%([\s\S]+?)%>/g,reInterpolate=/<%=([\s\S]+?)%>/g;var reIsDeepProp=/\.|\[(?:[^[\]]*|(["'])(?:(?!\1)[^\n\\]|\\.)*?\1)\]/,reIsPlainProp=/^\w*$/,rePropName=/[^.[\]]+|\[(?:(-?\d+(?:\.\d+)?)|(["'])((?:(?!\2)[^\n\\]|\\.)*?)\2)\]/g;var reRegExpChars=/^[:!,]|[\\^$.*+?()[\]{}|\/]|(^[0-9a-fA-Fnrtuvx])|([\n\r\u2028\u2029])/g,reHasRegExpChars=RegExp(reRegExpChars.source);var reComboMark=/[\u0300-\u036f\ufe20-\ufe23]/g;var reEscapeChar=/\\(\\)?/g;var reEsTemplate=/\$\{([^\\}]*(?:\\.[^\\}]*)*)\}/g;var reFlags=/\w*$/;var reHasHexPrefix=/^0[xX]/;var reIsHostCtor=/^\[object .+?Constructor\]$/;var reIsUint=/^\d+$/;var reLatin1=/[\xc0-\xd6\xd8-\xde\xdf-\xf6\xf8-\xff]/g;var reNoMatch=/($^)/;var reUnescapedString=/['\n\r\u2028\u2029\\]/g;var reWords=function(){var upper="[A-Z\\xc0-\\xd6\\xd8-\\xde]",lower="[a-z\\xdf-\\xf6\\xf8-\\xff]+";return RegExp(upper+"+(?="+upper+lower+")|"+upper+"?"+lower+"|"+upper+"+|[0-9]+","g")}();var contextProps=["Array","ArrayBuffer","Date","Error","Float32Array","Float64Array","Function","Int8Array","Int16Array","Int32Array","Math","Number","Object","RegExp","Set","String","_","clearTimeout","isFinite","parseFloat","parseInt","setTimeout","TypeError","Uint8Array","Uint8ClampedArray","Uint16Array","Uint32Array","WeakMap"];var templateCounter=-1;var typedArrayTags={};typedArrayTags[float32Tag]=typedArrayTags[float64Tag]=typedArrayTags[int8Tag]=typedArrayTags[int16Tag]=typedArrayTags[int32Tag]=typedArrayTags[uint8Tag]=typedArrayTags[uint8ClampedTag]=typedArrayTags[uint16Tag]=typedArrayTags[uint32Tag]=true;typedArrayTags[argsTag]=typedArrayTags[arrayTag]=typedArrayTags[arrayBufferTag]=typedArrayTags[boolTag]=typedArrayTags[dateTag]=typedArrayTags[errorTag]=typedArrayTags[funcTag]=typedArrayTags[mapTag]=typedArrayTags[numberTag]=typedArrayTags[objectTag]=typedArrayTags[regexpTag]=typedArrayTags[setTag]=typedArrayTags[stringTag]=typedArrayTags[weakMapTag]=false;var cloneableTags={};cloneableTags[argsTag]=cloneableTags[arrayTag]=cloneableTags[arrayBufferTag]=cloneableTags[boolTag]=cloneableTags[dateTag]=cloneableTags[float32Tag]=cloneableTags[float64Tag]=cloneableTags[int8Tag]=cloneableTags[int16Tag]=cloneableTags[int32Tag]=cloneableTags[numberTag]=cloneableTags[objectTag]=cloneableTags[regexpTag]=cloneableTags[stringTag]=cloneableTags[uint8Tag]=cloneableTags[uint8ClampedTag]=cloneableTags[uint16Tag]=cloneableTags[uint32Tag]=true;cloneableTags[errorTag]=cloneableTags[funcTag]=cloneableTags[mapTag]=cloneableTags[setTag]=cloneableTags[weakMapTag]=false;var deburredLetters={"À":"A","Á":"A","Â":"A","Ã":"A","Ä":"A","Å":"A","à":"a","á":"a","â":"a","ã":"a","ä":"a","å":"a","Ç":"C","ç":"c","Ð":"D","ð":"d","È":"E","É":"E","Ê":"E","Ë":"E","è":"e","é":"e","ê":"e","ë":"e","Ì":"I","Í":"I","Î":"I","Ï":"I","ì":"i","í":"i","î":"i","ï":"i","Ñ":"N","ñ":"n","Ò":"O","Ó":"O","Ô":"O","Õ":"O","Ö":"O","Ø":"O","ò":"o","ó":"o","ô":"o","õ":"o","ö":"o","ø":"o","Ù":"U","Ú":"U","Û":"U","Ü":"U","ù":"u","ú":"u","û":"u","ü":"u","Ý":"Y","ý":"y","ÿ":"y","Æ":"Ae","æ":"ae","Þ":"Th","þ":"th","ß":"ss"};var htmlEscapes={"&":"&","<":"<",">":">",'"':""","'":"'","`":"`"};var htmlUnescapes={"&":"&","<":"<",">":">",""":'"',"'":"'","`":"`"};var objectTypes={"function":true,object:true};var regexpEscapes={0:"x30",1:"x31",2:"x32",3:"x33",4:"x34",5:"x35",6:"x36",7:"x37",8:"x38",9:"x39",A:"x41",B:"x42",C:"x43",D:"x44",E:"x45",F:"x46",a:"x61",b:"x62",c:"x63",d:"x64",e:"x65",f:"x66",n:"x6e",r:"x72",t:"x74",u:"x75",v:"x76",x:"x78"};var stringEscapes={"\\":"\\","'":"'","\n":"n","\r":"r","\u2028":"u2028","\u2029":"u2029"};var freeExports=objectTypes[typeof exports]&&exports&&!exports.nodeType&&exports;var freeModule=objectTypes[typeof module]&&module&&!module.nodeType&&module;var freeGlobal=freeExports&&freeModule&&typeof global=="object"&&global&&global.Object&&global;var freeSelf=objectTypes[typeof self]&&self&&self.Object&&self;var freeWindow=objectTypes[typeof window]&&window&&window.Object&&window;var moduleExports=freeModule&&freeModule.exports===freeExports&&freeExports;var root=freeGlobal||freeWindow!==(this&&this.window)&&freeWindow||freeSelf||this;function baseCompareAscending(value,other){if(value!==other){var valIsNull=value===null,valIsUndef=value===undefined,valIsReflexive=value===value;var othIsNull=other===null,othIsUndef=other===undefined,othIsReflexive=other===other;if(value>other&&!othIsNull||!valIsReflexive||valIsNull&&!othIsUndef&&othIsReflexive||valIsUndef&&othIsReflexive){return 1}if(value-1){}return index}function charsRightIndex(string,chars){var index=string.length;while(index--&&chars.indexOf(string.charAt(index))>-1){}return index}function compareAscending(object,other){return baseCompareAscending(object.criteria,other.criteria)||object.index-other.index}function compareMultiple(object,other,orders){var index=-1,objCriteria=object.criteria,othCriteria=other.criteria,length=objCriteria.length,ordersLength=orders.length;while(++index=ordersLength){return result}var order=orders[index];return result*(order==="asc"||order===true?1:-1)}}return object.index-other.index}function deburrLetter(letter){return deburredLetters[letter]}function escapeHtmlChar(chr){return htmlEscapes[chr]}function escapeRegExpChar(chr,leadingChar,whitespaceChar){if(leadingChar){chr=regexpEscapes[chr]}else if(whitespaceChar){chr=stringEscapes[chr]}return"\\"+chr}function escapeStringChar(chr){return"\\"+stringEscapes[chr]}function indexOfNaN(array,fromIndex,fromRight){var length=array.length,index=fromIndex+(fromRight?0:-1);while(fromRight?index--:++index=9&&charCode<=13)||charCode==32||charCode==160||charCode==5760||charCode==6158||charCode>=8192&&(charCode<=8202||charCode==8232||charCode==8233||charCode==8239||charCode==8287||charCode==12288||charCode==65279)}function replaceHolders(array,placeholder){var index=-1,length=array.length,resIndex=-1,result=[];while(++index>>1;var MAX_SAFE_INTEGER=9007199254740991;var metaMap=WeakMap&&new WeakMap;var realNames={};function lodash(value){if(isObjectLike(value)&&!isArray(value)&&!(value instanceof LazyWrapper)){if(value instanceof LodashWrapper){return value}if(hasOwnProperty.call(value,"__chain__")&&hasOwnProperty.call(value,"__wrapped__")){return wrapperClone(value)}}return new LodashWrapper(value)}function baseLodash(){}function LodashWrapper(value,chainAll,actions){this.__wrapped__=value;this.__actions__=actions||[];this.__chain__=!!chainAll}var support=lodash.support={};lodash.templateSettings={escape:reEscape,evaluate:reEvaluate,interpolate:reInterpolate,variable:"",imports:{_:lodash}};function LazyWrapper(value){this.__wrapped__=value;this.__actions__=[];this.__dir__=1;this.__filtered__=false;this.__iteratees__=[];this.__takeCount__=POSITIVE_INFINITY;this.__views__=[]}function lazyClone(){var result=new LazyWrapper(this.__wrapped__);result.__actions__=arrayCopy(this.__actions__);result.__dir__=this.__dir__;result.__filtered__=this.__filtered__;result.__iteratees__=arrayCopy(this.__iteratees__);result.__takeCount__=this.__takeCount__;result.__views__=arrayCopy(this.__views__);return result}function lazyReverse(){if(this.__filtered__){var result=new LazyWrapper(this);result.__dir__=-1;result.__filtered__=true}else{result=this.clone();result.__dir__*=-1}return result}function lazyValue(){var array=this.__wrapped__.value(),dir=this.__dir__,isArr=isArray(array),isRight=dir<0,arrLength=isArr?array.length:0,view=getView(0,arrLength,this.__views__),start=view.start,end=view.end,length=end-start,index=isRight?end:start-1,iteratees=this.__iteratees__,iterLength=iteratees.length,resIndex=0,takeCount=nativeMin(length,this.__takeCount__);if(!isArr||arrLength=LARGE_ARRAY_SIZE?createCache(values):null,valuesLength=values.length;if(cache){indexOf=cacheIndexOf;isCommon=false;values=cache}outer:while(++indexlength?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end>>>0;start>>>=0;while(startlength?0:length+start}end=end===undefined||end>length?length:+end||0;if(end<0){end+=length}length=start>end?0:end-start>>>0;start>>>=0;var result=Array(length);while(++index=LARGE_ARRAY_SIZE,seen=isLarge?createCache():null,result=[];if(seen){indexOf=cacheIndexOf;isCommon=false}else{isLarge=false;seen=iteratee?[]:result}outer:while(++index>>1,computed=array[mid];if((retHighest?computed<=value:computed2?sources[length-2]:undefined,guard=length>2?sources[2]:undefined,thisArg=length>1?sources[length-1]:undefined;if(typeof customizer=="function"){customizer=bindCallback(customizer,thisArg,5);length-=2}else{customizer=typeof thisArg=="function"?thisArg:undefined;length-=customizer?1:0}if(guard&&isIterateeCall(sources[0],sources[1],guard)){customizer=length<3?undefined:customizer;length=1}while(++index-1?collection[index]:undefined}return baseFind(collection,predicate,eachFunc)}}function createFindIndex(fromRight){return function(array,predicate,thisArg){if(!(array&&array.length)){return-1}predicate=getCallback(predicate,thisArg,3);return baseFindIndex(array,predicate,fromRight)}}function createFindKey(objectFunc){return function(object,predicate,thisArg){predicate=getCallback(predicate,thisArg,3);return baseFind(object,predicate,objectFunc,true)}}function createFlow(fromRight){return function(){var wrapper,length=arguments.length,index=fromRight?length:-1,leftIndex=0,funcs=Array(length);while(fromRight?index--:++index=LARGE_ARRAY_SIZE){return wrapper.plant(value).value()}var index=0,result=length?funcs[index].apply(this,args):value;while(++index=length||!nativeIsFinite(length)){return""}var padLength=length-strLength;chars=chars==null?" ":chars+"";return repeat(chars,nativeCeil(padLength/chars.length)).slice(0,padLength)}function createPartialWrapper(func,bitmask,thisArg,partials){var isBind=bitmask&BIND_FLAG,Ctor=createCtorWrapper(func);function wrapper(){var argsIndex=-1,argsLength=arguments.length,leftIndex=-1,leftLength=partials.length,args=Array(leftLength+argsLength);while(++leftIndexarrLength)){return false}while(++index-1&&value%1==0&&value-1&&value%1==0&&value<=MAX_SAFE_INTEGER}function isStrictComparable(value){return value===value&&!isObject(value)}function mergeData(data,source){var bitmask=data[1],srcBitmask=source[1],newBitmask=bitmask|srcBitmask,isCommon=newBitmask0){if(++count>=HOT_COUNT){return key}}else{count=0}return baseSetData(key,value)}}();function shimKeys(object){var props=keysIn(object),propsLength=props.length,length=propsLength&&object.length;var allowIndexes=!!length&&isLength(length)&&(isArray(object)||isArguments(object));var index=-1,result=[];while(++index=120?createCache(othIndex&&value):null}var array=arrays[0],index=-1,length=array?array.length:0,seen=caches[0];outer:while(++index-1){splice.call(array,fromIndex,1)}}return array}var pullAt=restParam(function(array,indexes){indexes=baseFlatten(indexes);var result=baseAt(array,indexes);basePullAt(array,indexes.sort(baseCompareAscending));return result});function remove(array,predicate,thisArg){var result=[];if(!(array&&array.length)){return result}var index=-1,indexes=[],length=array.length;predicate=getCallback(predicate,thisArg,3);while(++index2?arrays[length-2]:undefined,thisArg=length>1?arrays[length-1]:undefined;if(length>2&&typeof iteratee=="function"){length-=2}else{iteratee=length>1&&typeof thisArg=="function"?(--length,thisArg):undefined;thisArg=undefined}arrays.length=length;return unzipWith(arrays,iteratee,thisArg)});function chain(value){var result=lodash(value);result.__chain__=true;return result}function tap(value,interceptor,thisArg){interceptor.call(thisArg,value);return value}function thru(value,interceptor,thisArg){return interceptor.call(thisArg,value)}function wrapperChain(){return chain(this)}function wrapperCommit(){return new LodashWrapper(this.value(),this.__chain__)}var wrapperConcat=restParam(function(values){values=baseFlatten(values);return this.thru(function(array){return arrayConcat(isArray(array)?array:[toObject(array)],values)})});function wrapperPlant(value){var result,parent=this;while(parent instanceof baseLodash){var clone=wrapperClone(parent);if(result){previous.__wrapped__=clone}else{result=clone}var previous=clone;parent=parent.__wrapped__}previous.__wrapped__=value;return result}function wrapperReverse(){var value=this.__wrapped__;var interceptor=function(value){return wrapped&&wrapped.__dir__<0?value:value.reverse()};if(value instanceof LazyWrapper){var wrapped=value;if(this.__actions__.length){wrapped=new LazyWrapper(this)}wrapped=wrapped.reverse();wrapped.__actions__.push({func:thru,args:[interceptor],thisArg:undefined});return new LodashWrapper(wrapped,this.__chain__)}return this.thru(interceptor)}function wrapperToString(){return this.value()+""}function wrapperValue(){return baseWrapperValue(this.__wrapped__,this.__actions__)}var at=restParam(function(collection,props){return baseAt(collection,baseFlatten(props))});var countBy=createAggregator(function(result,value,key){hasOwnProperty.call(result,key)?++result[key]:result[key]=1});function every(collection,predicate,thisArg){var func=isArray(collection)?arrayEvery:baseEvery;if(thisArg&&isIterateeCall(collection,predicate,thisArg)){predicate=undefined}if(typeof predicate!="function"||thisArg!==undefined){predicate=getCallback(predicate,thisArg,3)}return func(collection,predicate)}function filter(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,predicate)}var find=createFind(baseEach);var findLast=createFind(baseEachRight,true);function findWhere(collection,source){return find(collection,baseMatches(source))}var forEach=createForEach(arrayEach,baseEach);var forEachRight=createForEach(arrayEachRight,baseEachRight); +var groupBy=createAggregator(function(result,value,key){if(hasOwnProperty.call(result,key)){result[key].push(value)}else{result[key]=[value]}});function includes(collection,target,fromIndex,guard){var length=collection?getLength(collection):0;if(!isLength(length)){collection=values(collection);length=collection.length}if(typeof fromIndex!="number"||guard&&isIterateeCall(target,fromIndex,guard)){fromIndex=0}else{fromIndex=fromIndex<0?nativeMax(length+fromIndex,0):fromIndex||0}return typeof collection=="string"||!isArray(collection)&&isString(collection)?fromIndex<=length&&collection.indexOf(target,fromIndex)>-1:!!length&&getIndexOf(collection,target,fromIndex)>-1}var indexBy=createAggregator(function(result,value,key){result[key]=value});var invoke=restParam(function(collection,path,args){var index=-1,isFunc=typeof path=="function",isProp=isKey(path),result=isArrayLike(collection)?Array(collection.length):[];baseEach(collection,function(value){var func=isFunc?path:isProp&&value!=null?value[path]:undefined;result[++index]=func?func.apply(value,args):invokePath(value,path,args)});return result});function map(collection,iteratee,thisArg){var func=isArray(collection)?arrayMap:baseMap;iteratee=getCallback(iteratee,thisArg,3);return func(collection,iteratee)}var partition=createAggregator(function(result,value,key){result[key?0:1].push(value)},function(){return[[],[]]});function pluck(collection,path){return map(collection,property(path))}var reduce=createReduce(arrayReduce,baseEach);var reduceRight=createReduce(arrayReduceRight,baseEachRight);function reject(collection,predicate,thisArg){var func=isArray(collection)?arrayFilter:baseFilter;predicate=getCallback(predicate,thisArg,3);return func(collection,function(value,index,collection){return!predicate(value,index,collection)})}function sample(collection,n,guard){if(guard?isIterateeCall(collection,n,guard):n==null){collection=toIterable(collection);var length=collection.length;return length>0?collection[baseRandom(0,length-1)]:undefined}var index=-1,result=toArray(collection),length=result.length,lastIndex=length-1;n=nativeMin(n<0?0:+n||0,length);while(++index0){result=func.apply(this,arguments)}if(n<=1){func=undefined}return result}}var bind=restParam(function(func,thisArg,partials){var bitmask=BIND_FLAG;if(partials.length){var holders=replaceHolders(partials,bind.placeholder);bitmask|=PARTIAL_FLAG}return createWrapper(func,bitmask,thisArg,partials,holders)});var bindAll=restParam(function(object,methodNames){methodNames=methodNames.length?baseFlatten(methodNames):functions(object);var index=-1,length=methodNames.length;while(++indexwait){complete(trailingCall,maxTimeoutId)}else{timeoutId=setTimeout(delayed,remaining)}}function maxDelayed(){complete(trailing,timeoutId)}function debounced(){args=arguments;stamp=now();thisArg=this;trailingCall=trailing&&(timeoutId||!leading);if(maxWait===false){var leadingCall=leading&&!timeoutId}else{if(!maxTimeoutId&&!leading){lastCalled=stamp}var remaining=maxWait-(stamp-lastCalled),isCalled=remaining<=0||remaining>maxWait;if(isCalled){if(maxTimeoutId){maxTimeoutId=clearTimeout(maxTimeoutId)}lastCalled=stamp;result=func.apply(thisArg,args)}else if(!maxTimeoutId){maxTimeoutId=setTimeout(maxDelayed,remaining)}}if(isCalled&&timeoutId){timeoutId=clearTimeout(timeoutId)}else if(!timeoutId&&wait!==maxWait){timeoutId=setTimeout(delayed,wait)}if(leadingCall){isCalled=true;result=func.apply(thisArg,args)}if(isCalled&&!timeoutId&&!maxTimeoutId){args=thisArg=undefined}return result}debounced.cancel=cancel;return debounced}var defer=restParam(function(func,args){return baseDelay(func,1,args)});var delay=restParam(function(func,wait,args){return baseDelay(func,wait,args)});var flow=createFlow();var flowRight=createFlow(true);function memoize(func,resolver){if(typeof func!="function"||resolver&&typeof resolver!="function"){throw new TypeError(FUNC_ERROR_TEXT)}var memoized=function(){var args=arguments,key=resolver?resolver.apply(this,args):args[0],cache=memoized.cache;if(cache.has(key)){return cache.get(key)}var result=func.apply(this,args);memoized.cache=cache.set(key,result);return result};memoized.cache=new memoize.Cache;return memoized}var modArgs=restParam(function(func,transforms){transforms=baseFlatten(transforms);if(typeof func!="function"||!arrayEvery(transforms,baseIsFunction)){throw new TypeError(FUNC_ERROR_TEXT)}var length=transforms.length;return restParam(function(args){var index=nativeMin(args.length,length);while(index--){args[index]=transforms[index](args[index])}return func.apply(this,args)})});function negate(predicate){if(typeof predicate!="function"){throw new TypeError(FUNC_ERROR_TEXT)}return function(){return!predicate.apply(this,arguments)}}function once(func){return before(2,func)}var partial=createPartial(PARTIAL_FLAG);var partialRight=createPartial(PARTIAL_RIGHT_FLAG);var rearg=restParam(function(func,indexes){return createWrapper(func,REARG_FLAG,undefined,undefined,undefined,baseFlatten(indexes))});function restParam(func,start){if(typeof func!="function"){throw new TypeError(FUNC_ERROR_TEXT)}start=nativeMax(start===undefined?func.length-1:+start||0,0);return function(){var args=arguments,index=-1,length=nativeMax(args.length-start,0),rest=Array(length);while(++indexother}function gte(value,other){return value>=other}function isArguments(value){return isObjectLike(value)&&isArrayLike(value)&&hasOwnProperty.call(value,"callee")&&!propertyIsEnumerable.call(value,"callee")}var isArray=nativeIsArray||function(value){return isObjectLike(value)&&isLength(value.length)&&objToString.call(value)==arrayTag};function isBoolean(value){return value===true||value===false||isObjectLike(value)&&objToString.call(value)==boolTag}function isDate(value){return isObjectLike(value)&&objToString.call(value)==dateTag}function isElement(value){return!!value&&value.nodeType===1&&isObjectLike(value)&&!isPlainObject(value)}function isEmpty(value){if(value==null){return true}if(isArrayLike(value)&&(isArray(value)||isString(value)||isArguments(value)||isObjectLike(value)&&isFunction(value.splice))){return!value.length}return!keys(value).length}function isEqual(value,other,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;var result=customizer?customizer(value,other):undefined;return result===undefined?baseIsEqual(value,other,customizer):!!result}function isError(value){return isObjectLike(value)&&typeof value.message=="string"&&objToString.call(value)==errorTag}function isFinite(value){return typeof value=="number"&&nativeIsFinite(value)}function isFunction(value){return isObject(value)&&objToString.call(value)==funcTag}function isObject(value){var type=typeof value;return!!value&&(type=="object"||type=="function")}function isMatch(object,source,customizer,thisArg){customizer=typeof customizer=="function"?bindCallback(customizer,thisArg,3):undefined;return baseIsMatch(object,getMatchData(source),customizer)}function isNaN(value){return isNumber(value)&&value!=+value}function isNative(value){if(value==null){return false}if(isFunction(value)){return reIsNative.test(fnToString.call(value))}return isObjectLike(value)&&reIsHostCtor.test(value)}function isNull(value){return value===null}function isNumber(value){return typeof value=="number"||isObjectLike(value)&&objToString.call(value)==numberTag}function isPlainObject(value){var Ctor;if(!(isObjectLike(value)&&objToString.call(value)==objectTag&&!isArguments(value))||!hasOwnProperty.call(value,"constructor")&&(Ctor=value.constructor,typeof Ctor=="function"&&!(Ctor instanceof Ctor))){return false}var result;baseForIn(value,function(subValue,key){result=key});return result===undefined||hasOwnProperty.call(value,result)}function isRegExp(value){return isObject(value)&&objToString.call(value)==regexpTag}function isString(value){return typeof value=="string"||isObjectLike(value)&&objToString.call(value)==stringTag}function isTypedArray(value){return isObjectLike(value)&&isLength(value.length)&&!!typedArrayTags[objToString.call(value)]}function isUndefined(value){return value===undefined}function lt(value,other){return value0;while(++index=nativeMin(start,end)&&value=0&&string.indexOf(target,position)==position}function escape(string){string=baseToString(string);return string&&reHasUnescapedHtml.test(string)?string.replace(reUnescapedHtml,escapeHtmlChar):string}function escapeRegExp(string){string=baseToString(string);return string&&reHasRegExpChars.test(string)?string.replace(reRegExpChars,escapeRegExpChar):string||"(?:)"}var kebabCase=createCompounder(function(result,word,index){return result+(index?"-":"")+word.toLowerCase()});function pad(string,length,chars){string=baseToString(string);length=+length;var strLength=string.length;if(strLength>=length||!nativeIsFinite(length)){return string}var mid=(length-strLength)/2,leftLength=nativeFloor(mid),rightLength=nativeCeil(mid);chars=createPadding("",rightLength,chars);return chars.slice(0,leftLength)+string+chars}var padLeft=createPadDir();var padRight=createPadDir(true);function parseInt(string,radix,guard){if(guard?isIterateeCall(string,radix,guard):radix==null){radix=0}else if(radix){radix=+radix}string=trim(string);return nativeParseInt(string,radix||(reHasHexPrefix.test(string)?16:10))}function repeat(string,n){var result="";string=baseToString(string);n=+n;if(n<1||!string||!nativeIsFinite(n)){return result}do{if(n%2){result+=string}n=nativeFloor(n/2);string+=string}while(n);return result}var snakeCase=createCompounder(function(result,word,index){return result+(index?"_":"")+word.toLowerCase()});var startCase=createCompounder(function(result,word,index){return result+(index?" ":"")+(word.charAt(0).toUpperCase()+word.slice(1))});function startsWith(string,target,position){string=baseToString(string);position=position==null?0:nativeMin(position<0?0:+position||0,string.length);return string.lastIndexOf(target,position)==position}function template(string,options,otherOptions){var settings=lodash.templateSettings;if(otherOptions&&isIterateeCall(string,options,otherOptions)){options=otherOptions=undefined}string=baseToString(string);options=assignWith(baseAssign({},otherOptions||options),settings,assignOwnDefaults);var imports=assignWith(baseAssign({},options.imports),settings.imports,assignOwnDefaults),importsKeys=keys(imports),importsValues=baseValues(imports,importsKeys);var isEscaping,isEvaluating,index=0,interpolate=options.interpolate||reNoMatch,source="__p += '";var reDelimiters=RegExp((options.escape||reNoMatch).source+"|"+interpolate.source+"|"+(interpolate===reInterpolate?reEsTemplate:reNoMatch).source+"|"+(options.evaluate||reNoMatch).source+"|$","g");var sourceURL="//# sourceURL="+("sourceURL"in options?options.sourceURL:"lodash.templateSources["+ ++templateCounter+"]")+"\n";string.replace(reDelimiters,function(match,escapeValue,interpolateValue,esTemplateValue,evaluateValue,offset){interpolateValue||(interpolateValue=esTemplateValue);source+=string.slice(index,offset).replace(reUnescapedString,escapeStringChar);if(escapeValue){isEscaping=true;source+="' +\n__e("+escapeValue+") +\n'"}if(evaluateValue){isEvaluating=true;source+="';\n"+evaluateValue+";\n__p += '"}if(interpolateValue){source+="' +\n((__t = ("+interpolateValue+")) == null ? '' : __t) +\n'"}index=offset+match.length;return match});source+="';\n";var variable=options.variable;if(!variable){source="with (obj) {\n"+source+"\n}\n"}source=(isEvaluating?source.replace(reEmptyStringLeading,""):source).replace(reEmptyStringMiddle,"$1").replace(reEmptyStringTrailing,"$1;");source="function("+(variable||"obj")+") {\n"+(variable?"":"obj || (obj = {});\n")+"var __t, __p = ''"+(isEscaping?", __e = _.escape":"")+(isEvaluating?", __j = Array.prototype.join;\n"+"function print() { __p += __j.call(arguments, '') }\n":";\n")+source+"return __p\n}";var result=attempt(function(){return Function(importsKeys,sourceURL+"return "+source).apply(undefined,importsValues)});result.source=source;if(isError(result)){throw result}return result}function trim(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string),trimmedRightIndex(string)+1)}chars=chars+"";return string.slice(charsLeftIndex(string,chars),charsRightIndex(string,chars)+1)}function trimLeft(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(trimmedLeftIndex(string))}return string.slice(charsLeftIndex(string,chars+""))}function trimRight(string,chars,guard){var value=string;string=baseToString(string);if(!string){return string}if(guard?isIterateeCall(value,chars,guard):chars==null){return string.slice(0,trimmedRightIndex(string)+1)}return string.slice(0,charsRightIndex(string,chars+"")+1)}function trunc(string,options,guard){if(guard&&isIterateeCall(string,options,guard)){options=undefined}var length=DEFAULT_TRUNC_LENGTH,omission=DEFAULT_TRUNC_OMISSION;if(options!=null){if(isObject(options)){var separator="separator"in options?options.separator:separator;length="length"in options?+options.length||0:length;omission="omission"in options?baseToString(options.omission):omission}else{length=+options||0}}string=baseToString(string);if(length>=string.length){return string}var end=length-omission.length;if(end<1){return omission}var result=string.slice(0,end);if(separator==null){return result+omission}if(isRegExp(separator)){if(string.slice(end).search(separator)){var match,newEnd,substring=string.slice(0,end);if(!separator.global){separator=RegExp(separator.source,(reFlags.exec(separator)||"")+"g")}separator.lastIndex=0;while(match=separator.exec(substring)){newEnd=match.index}result=result.slice(0,newEnd==null?end:newEnd)}}else if(string.indexOf(separator,end)!=end){var index=result.lastIndexOf(separator);if(index>-1){result=result.slice(0,index)}}return result+omission}function unescape(string){string=baseToString(string);return string&&reHasEscapedHtml.test(string)?string.replace(reEscapedHtml,unescapeHtmlChar):string}function words(string,pattern,guard){if(guard&&isIterateeCall(string,pattern,guard)){pattern=undefined}string=baseToString(string);return string.match(pattern||reWords)||[]}var attempt=restParam(function(func,args){try{return func.apply(undefined,args)}catch(e){return isError(e)?e:new Error(e)}});function callback(func,thisArg,guard){if(guard&&isIterateeCall(func,thisArg,guard)){thisArg=undefined}return isObjectLike(func)?matches(func):baseCallback(func,thisArg)}function constant(value){return function(){return value}}function identity(value){return value}function matches(source){return baseMatches(baseClone(source,true))}function matchesProperty(path,srcValue){return baseMatchesProperty(path,baseClone(srcValue,true))}var method=restParam(function(path,args){return function(object){return invokePath(object,path,args)}});var methodOf=restParam(function(object,args){return function(path){return invokePath(object,path,args)}});function mixin(object,source,options){if(options==null){var isObj=isObject(source),props=isObj?keys(source):undefined,methodNames=props&&props.length?baseFunctions(source,props):undefined;if(!(methodNames?methodNames.length:isObj)){methodNames=false;options=source;source=object;object=this}}if(!methodNames){methodNames=baseFunctions(source,keys(source))}var chain=true,index=-1,isFunc=isFunction(object),length=methodNames.length;if(options===false){chain=false}else if(isObject(options)&&"chain"in options){chain=options.chain}while(++index0||end<0)){return new LazyWrapper(result)}if(start<0){result=result.takeRight(-start)}else if(start){result=result.drop(start)}if(end!==undefined){end=+end||0;result=end<0?result.dropRight(-end):result.take(end-start)}return result};LazyWrapper.prototype.takeRightWhile=function(predicate,thisArg){return this.reverse().takeWhile(predicate,thisArg).reverse()};LazyWrapper.prototype.toArray=function(){return this.take(POSITIVE_INFINITY)};baseForOwn(LazyWrapper.prototype,function(func,methodName){var checkIteratee=/^(?:filter|map|reject)|While$/.test(methodName),retUnwrapped=/^(?:first|last)$/.test(methodName),lodashFunc=lodash[retUnwrapped?"take"+(methodName=="last"?"Right":""):methodName];if(!lodashFunc){return}lodash.prototype[methodName]=function(){var args=retUnwrapped?[1]:arguments,chainAll=this.__chain__,value=this.__wrapped__,isHybrid=!!this.__actions__.length,isLazy=value instanceof LazyWrapper,iteratee=args[0],useLazy=isLazy||isArray(value);if(useLazy&&checkIteratee&&typeof iteratee=="function"&&iteratee.length!=1){isLazy=useLazy=false}var interceptor=function(value){return retUnwrapped&&chainAll?lodashFunc(value,1)[0]:lodashFunc.apply(undefined,arrayPush([value],args))};var action={func:thru,args:[interceptor],thisArg:undefined},onlyLazy=isLazy&&!isHybrid;if(retUnwrapped&&!chainAll){if(onlyLazy){value=value.clone();value.__actions__.push(action);return func.call(value)}return lodashFunc.call(undefined,this.value())[0]}if(!retUnwrapped&&useLazy){value=onlyLazy?value:new LazyWrapper(this);var result=func.apply(value,args);result.__actions__.push(action);return new LodashWrapper(result,chainAll)}return this.thru(interceptor)}});arrayEach(["join","pop","push","replace","shift","sort","splice","split","unshift"],function(methodName){var func=(/^(?:replace|split)$/.test(methodName)?stringProto:arrayProto)[methodName],chainName=/^(?:push|sort|unshift)$/.test(methodName)?"tap":"thru",retUnwrapped=/^(?:join|pop|replace|shift)$/.test(methodName);lodash.prototype[methodName]=function(){var args=arguments;if(retUnwrapped&&!this.__chain__){return func.apply(this.value(),args)}return this[chainName](function(value){return func.apply(value,args)})}});baseForOwn(LazyWrapper.prototype,function(func,methodName){var lodashFunc=lodash[methodName];if(lodashFunc){var key=lodashFunc.name,names=realNames[key]||(realNames[key]=[]);names.push({name:methodName,func:lodashFunc})}});realNames[createHybridWrapper(undefined,BIND_KEY_FLAG).name]=[{name:"wrapper",func:undefined}];LazyWrapper.prototype.clone=lazyClone;LazyWrapper.prototype.reverse=lazyReverse;LazyWrapper.prototype.value=lazyValue;lodash.prototype.chain=wrapperChain;lodash.prototype.commit=wrapperCommit;lodash.prototype.concat=wrapperConcat;lodash.prototype.plant=wrapperPlant;lodash.prototype.reverse=wrapperReverse;lodash.prototype.toString=wrapperToString;lodash.prototype.run=lodash.prototype.toJSON=lodash.prototype.valueOf=lodash.prototype.value=wrapperValue;lodash.prototype.collect=lodash.prototype.map;lodash.prototype.head=lodash.prototype.first;lodash.prototype.select=lodash.prototype.filter;lodash.prototype.tail=lodash.prototype.rest;return lodash}var _=runInContext();if(typeof define=="function"&&typeof define.amd=="object"&&define.amd){root._=_;define(function(){return _})}else if(freeExports&&freeModule){if(moduleExports){(freeModule.exports=_)._=_}else{freeExports._=_}}else{root._=_}}).call(this)}).call(this,typeof global!=="undefined"?global:typeof self!=="undefined"?self:typeof window!=="undefined"?window:{})},{}],3:[function(require,module,exports){(function(window,document,undefined){var _MAP={8:"backspace",9:"tab",13:"enter",16:"shift",17:"ctrl",18:"alt",20:"capslock",27:"esc",32:"space",33:"pageup",34:"pagedown",35:"end",36:"home",37:"left",38:"up",39:"right",40:"down",45:"ins",46:"del",91:"meta",93:"meta",224:"meta"};var _KEYCODE_MAP={106:"*",107:"+",109:"-",110:".",111:"/",186:";",187:"=",188:",",189:"-",190:".",191:"/",192:"`",219:"[",220:"\\",221:"]",222:"'"};var _SHIFT_MAP={"~":"`","!":"1","@":"2","#":"3",$:"4","%":"5","^":"6","&":"7","*":"8","(":"9",")":"0",_:"-","+":"=",":":";",'"':"'","<":",",">":".","?":"/","|":"\\"};var _SPECIAL_ALIASES={option:"alt",command:"meta","return":"enter",escape:"esc",plus:"+",mod:/Mac|iPod|iPhone|iPad/.test(navigator.platform)?"meta":"ctrl"};var _REVERSE_MAP;for(var i=1;i<20;++i){_MAP[111+i]="f"+i}for(i=0;i<=9;++i){_MAP[i+96]=i}function _addEvent(object,type,callback){if(object.addEventListener){object.addEventListener(type,callback,false);return}object.attachEvent("on"+type,callback)}function _characterFromEvent(e){if(e.type=="keypress"){var character=String.fromCharCode(e.which);if(!e.shiftKey){character=character.toLowerCase()}return character}if(_MAP[e.which]){return _MAP[e.which]}if(_KEYCODE_MAP[e.which]){return _KEYCODE_MAP[e.which]}return String.fromCharCode(e.which).toLowerCase()}function _modifiersMatch(modifiers1,modifiers2){return modifiers1.sort().join(",")===modifiers2.sort().join(",")}function _eventModifiers(e){var modifiers=[];if(e.shiftKey){modifiers.push("shift")}if(e.altKey){modifiers.push("alt")}if(e.ctrlKey){modifiers.push("ctrl")}if(e.metaKey){modifiers.push("meta")}return modifiers}function _preventDefault(e){if(e.preventDefault){e.preventDefault();return}e.returnValue=false}function _stopPropagation(e){if(e.stopPropagation){e.stopPropagation();return}e.cancelBubble=true}function _isModifier(key){return key=="shift"||key=="ctrl"||key=="alt"||key=="meta"}function _getReverseMap(){if(!_REVERSE_MAP){_REVERSE_MAP={};for(var key in _MAP){if(key>95&&key<112){continue}if(_MAP.hasOwnProperty(key)){_REVERSE_MAP[_MAP[key]]=key}}}return _REVERSE_MAP}function _pickBestAction(key,modifiers,action){if(!action){action=_getReverseMap()[key]?"keydown":"keypress"}if(action=="keypress"&&modifiers.length){action="keydown"}return action}function _keysFromString(combination){if(combination==="+"){return["+"]}combination=combination.replace(/\+{2}/g,"+plus");return combination.split("+")}function _getKeyInfo(combination,action){var keys;var key;var i;var modifiers=[];keys=_keysFromString(combination);for(i=0;i1){_bindSequence(combination,sequence,callback,action);return}info=_getKeyInfo(combination,action);self._callbacks[info.key]=self._callbacks[info.key]||[];_getMatches(info.key,info.modifiers,{type:info.action},sequenceName,combination,level);self._callbacks[info.key][sequenceName?"unshift":"push"]({callback:callback,modifiers:info.modifiers,action:info.action,seq:sequenceName,level:level,combo:combination})}self._bindMultiple=function(combinations,callback,action){for(var i=0;i-1){return false}if(_belongsTo(element,self.target)){return false}return element.tagName=="INPUT"||element.tagName=="SELECT"||element.tagName=="TEXTAREA"||element.isContentEditable};Mousetrap.prototype.handleKey=function(){var self=this;return self._handleKey.apply(self,arguments)};Mousetrap.init=function(){var documentMousetrap=Mousetrap(document);for(var method in documentMousetrap){if(method.charAt(0)!=="_"){Mousetrap[method]=function(method){return function(){return documentMousetrap[method].apply(documentMousetrap,arguments)}}(method)}}};Mousetrap.init();window.Mousetrap=Mousetrap;if(typeof module!=="undefined"&&module.exports){module.exports=Mousetrap}if(typeof define==="function"&&define.amd){define(function(){return Mousetrap})}})(window,document)},{}],4:[function(require,module,exports){(function(process){function normalizeArray(parts,allowAboveRoot){var up=0;for(var i=parts.length-1;i>=0;i--){var last=parts[i];if(last==="."){parts.splice(i,1)}else if(last===".."){parts.splice(i,1);up++}else if(up){parts.splice(i,1);up--}}if(allowAboveRoot){for(;up--;up){parts.unshift("..")}}return parts}var splitPathRe=/^(\/?|)([\s\S]*?)((?:\.{1,2}|[^\/]+?|)(\.[^.\/]*|))(?:[\/]*)$/;var splitPath=function(filename){return splitPathRe.exec(filename).slice(1)};exports.resolve=function(){var resolvedPath="",resolvedAbsolute=false;for(var i=arguments.length-1;i>=-1&&!resolvedAbsolute;i--){var path=i>=0?arguments[i]:process.cwd();if(typeof path!=="string"){throw new TypeError("Arguments to path.resolve must be strings")}else if(!path){continue}resolvedPath=path+"/"+resolvedPath;resolvedAbsolute=path.charAt(0)==="/"}resolvedPath=normalizeArray(filter(resolvedPath.split("/"),function(p){return!!p}),!resolvedAbsolute).join("/");return(resolvedAbsolute?"/":"")+resolvedPath||"."};exports.normalize=function(path){var isAbsolute=exports.isAbsolute(path),trailingSlash=substr(path,-1)==="/";path=normalizeArray(filter(path.split("/"),function(p){return!!p}),!isAbsolute).join("/");if(!path&&!isAbsolute){path="."}if(path&&trailingSlash){path+="/"}return(isAbsolute?"/":"")+path};exports.isAbsolute=function(path){return path.charAt(0)==="/"};exports.join=function(){var paths=Array.prototype.slice.call(arguments,0);return exports.normalize(filter(paths,function(p,index){if(typeof p!=="string"){throw new TypeError("Arguments to path.join must be strings")}return p}).join("/"))};exports.relative=function(from,to){from=exports.resolve(from).substr(1);to=exports.resolve(to).substr(1);function trim(arr){var start=0;for(;start=0;end--){if(arr[end]!=="")break}if(start>end)return[];return arr.slice(start,end-start+1)}var fromParts=trim(from.split("/"));var toParts=trim(to.split("/"));var length=Math.min(fromParts.length,toParts.length);var samePartsLength=length;for(var i=0;i1){for(var i=1;i= 0x80 (not a basic code point)","invalid-input":"Invalid input"},baseMinusTMin=base-tMin,floor=Math.floor,stringFromCharCode=String.fromCharCode,key;function error(type){throw RangeError(errors[type])}function map(array,fn){var length=array.length;var result=[];while(length--){result[length]=fn(array[length])}return result}function mapDomain(string,fn){var parts=string.split("@");var result="";if(parts.length>1){result=parts[0]+"@";string=parts[1]}string=string.replace(regexSeparators,".");var labels=string.split(".");var encoded=map(labels,fn).join(".");return result+encoded}function ucs2decode(string){var output=[],counter=0,length=string.length,value,extra;while(counter=55296&&value<=56319&&counter65535){value-=65536;output+=stringFromCharCode(value>>>10&1023|55296);value=56320|value&1023}output+=stringFromCharCode(value);return output}).join("")}function basicToDigit(codePoint){if(codePoint-48<10){return codePoint-22}if(codePoint-65<26){return codePoint-65}if(codePoint-97<26){return codePoint-97}return base}function digitToBasic(digit,flag){return digit+22+75*(digit<26)-((flag!=0)<<5)}function adapt(delta,numPoints,firstTime){var k=0;delta=firstTime?floor(delta/damp):delta>>1;delta+=floor(delta/numPoints);for(;delta>baseMinusTMin*tMax>>1;k+=base){delta=floor(delta/baseMinusTMin)}return floor(k+(baseMinusTMin+1)*delta/(delta+skew))}function decode(input){var output=[],inputLength=input.length,out,i=0,n=initialN,bias=initialBias,basic,j,index,oldi,w,k,digit,t,baseMinusT;basic=input.lastIndexOf(delimiter);if(basic<0){basic=0}for(j=0;j=128){error("not-basic")}output.push(input.charCodeAt(j))}for(index=basic>0?basic+1:0;index=inputLength){error("invalid-input")}digit=basicToDigit(input.charCodeAt(index++));if(digit>=base||digit>floor((maxInt-i)/w)){error("overflow")}i+=digit*w;t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(digitfloor(maxInt/baseMinusT)){error("overflow")}w*=baseMinusT}out=output.length+1;bias=adapt(i-oldi,out,oldi==0);if(floor(i/out)>maxInt-n){error("overflow")}n+=floor(i/out);i%=out;output.splice(i++,0,n)}return ucs2encode(output)}function encode(input){var n,delta,handledCPCount,basicLength,bias,j,m,q,k,t,currentValue,output=[],inputLength,handledCPCountPlusOne,baseMinusT,qMinusT;input=ucs2decode(input);inputLength=input.length;n=initialN;delta=0;bias=initialBias;for(j=0;j=n&¤tValuefloor((maxInt-delta)/handledCPCountPlusOne)){error("overflow")}delta+=(m-n)*handledCPCountPlusOne;n=m;for(j=0;jmaxInt){error("overflow")}if(currentValue==n){for(q=delta,k=base;;k+=base){t=k<=bias?tMin:k>=bias+tMax?tMax:k-bias;if(q0&&len>maxKeys){len=maxKeys}for(var i=0;i=0){kstr=x.substr(0,idx);vstr=x.substr(idx+1)}else{kstr=x;vstr=""}k=decodeURIComponent(kstr);v=decodeURIComponent(vstr);if(!hasOwnProperty(obj,k)){obj[k]=v}else if(isArray(obj[k])){obj[k].push(v)}else{obj[k]=[obj[k],v]}}return obj};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"}},{}],8:[function(require,module,exports){"use strict";var stringifyPrimitive=function(v){switch(typeof v){case"string":return v;case"boolean":return v?"true":"false";case"number":return isFinite(v)?v:"";default:return""}};module.exports=function(obj,sep,eq,name){sep=sep||"&";eq=eq||"=";if(obj===null){obj=undefined}if(typeof obj==="object"){return map(objectKeys(obj),function(k){var ks=encodeURIComponent(stringifyPrimitive(k))+eq;if(isArray(obj[k])){return map(obj[k],function(v){return ks+encodeURIComponent(stringifyPrimitive(v))}).join(sep)}else{return ks+encodeURIComponent(stringifyPrimitive(obj[k]))}}).join(sep)}if(!name)return"";return encodeURIComponent(stringifyPrimitive(name))+eq+encodeURIComponent(stringifyPrimitive(obj))};var isArray=Array.isArray||function(xs){return Object.prototype.toString.call(xs)==="[object Array]"};function map(xs,f){if(xs.map)return xs.map(f);var res=[];for(var i=0;i",'"',"`"," ","\r","\n"," "],unwise=["{","}","|","\\","^","`"].concat(delims),autoEscape=["'"].concat(unwise),nonHostChars=["%","/","?",";","#"].concat(autoEscape),hostEndingChars=["/","?","#"],hostnameMaxLen=255,hostnamePartPattern=/^[a-z0-9A-Z_-]{0,63}$/,hostnamePartStart=/^([a-z0-9A-Z_-]{0,63})(.*)$/,unsafeProtocol={javascript:true,"javascript:":true},hostlessProtocol={javascript:true,"javascript:":true},slashedProtocol={http:true,https:true,ftp:true,gopher:true,file:true,"http:":true,"https:":true,"ftp:":true,"gopher:":true,"file:":true},querystring=require("querystring");function urlParse(url,parseQueryString,slashesDenoteHost){if(url&&isObject(url)&&url instanceof Url)return url;var u=new Url;u.parse(url,parseQueryString,slashesDenoteHost);return u}Url.prototype.parse=function(url,parseQueryString,slashesDenoteHost){if(!isString(url)){throw new TypeError("Parameter 'url' must be a string, not "+typeof url)}var rest=url;rest=rest.trim();var proto=protocolPattern.exec(rest);if(proto){proto=proto[0];var lowerProto=proto.toLowerCase();this.protocol=lowerProto;rest=rest.substr(proto.length)}if(slashesDenoteHost||proto||rest.match(/^\/\/[^@\/]+@[^@\/]+/)){var slashes=rest.substr(0,2)==="//";if(slashes&&!(proto&&hostlessProtocol[proto])){rest=rest.substr(2);this.slashes=true}}if(!hostlessProtocol[proto]&&(slashes||proto&&!slashedProtocol[proto])){var hostEnd=-1;for(var i=0;i127){newpart+="x"}else{newpart+=part[j]}}if(!newpart.match(hostnamePartPattern)){var validParts=hostparts.slice(0,i);var notHost=hostparts.slice(i+1);var bit=part.match(hostnamePartStart);if(bit){validParts.push(bit[1]);notHost.unshift(bit[2])}if(notHost.length){rest="/"+notHost.join(".")+rest}this.hostname=validParts.join(".");break}}}}if(this.hostname.length>hostnameMaxLen){this.hostname=""}else{this.hostname=this.hostname.toLowerCase()}if(!ipv6Hostname){var domainArray=this.hostname.split(".");var newOut=[];for(var i=0;i0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}result.search=relative.search;result.query=relative.query;if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.href=result.format();return result}if(!srcPath.length){result.pathname=null;if(result.search){result.path="/"+result.search}else{result.path=null}result.href=result.format();return result}var last=srcPath.slice(-1)[0];var hasTrailingSlash=(result.host||relative.host)&&(last==="."||last==="..")||last==="";var up=0;for(var i=srcPath.length;i>=0;i--){last=srcPath[i];if(last=="."){srcPath.splice(i,1)}else if(last===".."){srcPath.splice(i,1);up++}else if(up){srcPath.splice(i,1);up--}}if(!mustEndAbs&&!removeAllDots){for(;up--;up){srcPath.unshift("..")}}if(mustEndAbs&&srcPath[0]!==""&&(!srcPath[0]||srcPath[0].charAt(0)!=="/")){srcPath.unshift("")}if(hasTrailingSlash&&srcPath.join("/").substr(-1)!=="/"){srcPath.push("")}var isAbsolute=srcPath[0]===""||srcPath[0]&&srcPath[0].charAt(0)==="/";if(psychotic){result.hostname=result.host=isAbsolute?"":srcPath.length?srcPath.shift():"";var authInHost=result.host&&result.host.indexOf("@")>0?result.host.split("@"):false;if(authInHost){result.auth=authInHost.shift();result.host=result.hostname=authInHost.shift()}}mustEndAbs=mustEndAbs||result.host&&srcPath.length;if(mustEndAbs&&!isAbsolute){srcPath.unshift("")}if(!srcPath.length){result.pathname=null;result.path=null}else{result.pathname=srcPath.join("/")}if(!isNull(result.pathname)||!isNull(result.search)){result.path=(result.pathname?result.pathname:"")+(result.search?result.search:"")}result.auth=relative.auth||result.auth;result.slashes=result.slashes||relative.slashes;result.href=result.format();return result};Url.prototype.parseHost=function(){var host=this.host;var port=portPattern.exec(host);if(port){port=port[0];if(port!==":"){this.port=port.substr(1)}host=host.substr(0,host.length-port.length)}if(host)this.hostname=host};function isString(arg){return typeof arg==="string"}function isObject(arg){return typeof arg==="object"&&arg!==null}function isNull(arg){return arg===null}function isNullOrUndefined(arg){return arg==null}},{punycode:6,querystring:9}],11:[function(require,module,exports){var $=require("jquery");function toggleDropdown(e){var $dropdown=$(e.currentTarget).parent().find(".dropdown-menu");$dropdown.toggleClass("open");e.stopPropagation();e.preventDefault()}function closeDropdown(e){$(".dropdown-menu").removeClass("open")}function init(){$(document).on("click",".toggle-dropdown",toggleDropdown);$(document).on("click",".dropdown-menu",function(e){e.stopPropagation()});$(document).on("click",closeDropdown)}module.exports={init:init}},{jquery:1}],12:[function(require,module,exports){var $=require("jquery");module.exports=$({})},{jquery:1}],13:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var dropdown=require("./dropdown");var events=require("./events");var state=require("./state");var keyboard=require("./keyboard");var navigation=require("./navigation");var sidebar=require("./sidebar");var toolbar=require("./toolbar");function start(config){sidebar.init();keyboard.init();dropdown.init();navigation.init();toolbar.createButton({index:0,icon:"fa fa-align-justify",onClick:function(e){e.preventDefault();sidebar.toggle()}});events.trigger("start",config);navigation.notify()}var gitbook={start:start,events:events,state:state,toolbar:toolbar,sidebar:sidebar,storage:storage,keyboard:keyboard};var MODULES={gitbook:gitbook,jquery:$,lodash:_};window.gitbook=gitbook;window.$=$;window.jQuery=$;gitbook.require=function(mods,fn){mods=_.map(mods,function(mod){mod=mod.toLowerCase();if(!MODULES[mod]){throw new Error("GitBook module "+mod+" doesn't exist")}return MODULES[mod]});fn.apply(null,mods)};module.exports={}},{"./dropdown":11,"./events":12,"./keyboard":14,"./navigation":16,"./sidebar":18,"./state":19,"./storage":20,"./toolbar":21,jquery:1,lodash:2}],14:[function(require,module,exports){var Mousetrap=require("mousetrap");var navigation=require("./navigation");var sidebar=require("./sidebar");function bindShortcut(keys,fn){Mousetrap.bind(keys,function(e){fn();return false})}function init(){bindShortcut(["right"],function(e){navigation.goNext()});bindShortcut(["left"],function(e){navigation.goPrev()});bindShortcut(["s"],function(e){sidebar.toggle()})}module.exports={init:init,bind:bindShortcut}},{"./navigation":16,"./sidebar":18,mousetrap:3}],15:[function(require,module,exports){var state=require("./state");function showLoading(p){state.$book.addClass("is-loading");p.always(function(){state.$book.removeClass("is-loading")});return p}module.exports={show:showLoading}},{"./state":19}],16:[function(require,module,exports){var $=require("jquery");var url=require("url");var events=require("./events");var state=require("./state");var loading=require("./loading");var usePushState=typeof history.pushState!=="undefined";function handleNavigation(relativeUrl,push){var uri=url.resolve(window.location.pathname,relativeUrl);notifyPageChange();location.href=relativeUrl;return;return loading.show($.get(uri).done(function(html){if(push)history.pushState({path:uri},null,uri);html=html.replace(/<(\/?)(html|head|body)([^>]*)>/gi,function(a,b,c,d){return"<"+b+"div"+(b?"":' data-element="'+c+'"')+d+">"});var $page=$(html);var $pageHead=$page.find("[data-element=head]");var $pageBody=$page.find(".book");document.title=$pageHead.find("title").text();var $head=$("head");$head.find("link[rel=prev]").remove();$head.find("link[rel=next]").remove();$head.append($pageHead.find("link[rel=prev]"));$head.append($pageHead.find("link[rel=next]"));var bodyClass=$(".book").attr("class");var scrollPosition=$(".book-summary .summary").scrollTop();$pageBody.toggleClass("with-summary",$(".book").hasClass("with-summary"));$(".book").replaceWith($pageBody);$(".book").attr("class",bodyClass);$(".book-summary .summary").scrollTop(scrollPosition);state.update($("html"));preparePage()}).fail(function(e){location.href=relativeUrl}))}function updateNavigationPosition(){var bodyInnerWidth,pageWrapperWidth;bodyInnerWidth=parseInt($(".body-inner").css("width"),10);pageWrapperWidth=parseInt($(".page-wrapper").css("width"),10);$(".navigation-next").css("margin-right",bodyInnerWidth-pageWrapperWidth+"px")}function notifyPageChange(){events.trigger("page.change")}function preparePage(notify){var $bookBody=$(".book-body");var $bookInner=$bookBody.find(".body-inner");var $pageWrapper=$bookInner.find(".page-wrapper");updateNavigationPosition();$bookInner.scrollTop(0);$bookBody.scrollTop(0);if(notify!==false)notifyPageChange()}function isLeftClickEvent(e){return e.button===0}function isModifiedEvent(e){return!!(e.metaKey||e.altKey||e.ctrlKey||e.shiftKey)}function handlePagination(e){if(isModifiedEvent(e)||!isLeftClickEvent(e)){return}e.stopPropagation();e.preventDefault();var url=$(this).attr("href");if(url)handleNavigation(url,true)}function goNext(){var url=$(".navigation-next").attr("href");if(url)handleNavigation(url,true)}function goPrev(){var url=$(".navigation-prev").attr("href");if(url)handleNavigation(url,true)}function init(){$.ajaxSetup({});if(location.protocol!=="file:"){history.replaceState({path:window.location.href},"")}window.onpopstate=function(event){if(event.state===null){return}return handleNavigation(event.state.path,false)};$(document).on("click",".navigation-prev",handlePagination);$(document).on("click",".navigation-next",handlePagination);$(document).on("click",".summary [data-path] a",handlePagination);$(window).resize(updateNavigationPosition);preparePage(false)}module.exports={init:init,goNext:goNext,goPrev:goPrev,notify:notifyPageChange}},{"./events":12,"./loading":15,"./state":19,jquery:1,url:10}],17:[function(require,module,exports){module.exports={isMobile:function(){return document.body.clientWidth<=600}}},{}],18:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var storage=require("./storage");var platform=require("./platform");var state=require("./state");function toggleSidebar(_state,animation){if(state!=null&&isOpen()==_state)return;if(animation==null)animation=true;state.$book.toggleClass("without-animation",!animation);state.$book.toggleClass("with-summary",_state);storage.set("sidebar",isOpen())}function isOpen(){return state.$book.hasClass("with-summary")}function init(){if(platform.isMobile()){toggleSidebar(false,false)}else{toggleSidebar(storage.get("sidebar",true),false)}$(document).on("click",".book-summary li.chapter a",function(e){if(platform.isMobile())toggleSidebar(false,false)})}function filterSummary(paths){var $summary=$(".book-summary");$summary.find("li").each(function(){var path=$(this).data("path");var st=paths==null||_.contains(paths,path);$(this).toggle(st);if(st)$(this).parents("li").show()})}module.exports={init:init,isOpen:isOpen,toggle:toggleSidebar,filter:filterSummary}},{"./platform":17,"./state":19,"./storage":20,jquery:1,lodash:2}],19:[function(require,module,exports){var $=require("jquery");var url=require("url");var path=require("path");var state={};state.update=function(dom){var $book=$(dom.find(".book"));state.$book=$book;state.level=$book.data("level");state.basePath=$book.data("basepath");state.innerLanguage=$book.data("innerlanguage");state.revision=$book.data("revision");state.filepath=$book.data("filepath");state.chapterTitle=$book.data("chapter-title");state.root=url.resolve(location.protocol+"//"+location.host,path.dirname(path.resolve(location.pathname.replace(/\/$/,"/index.html"),state.basePath))).replace(/\/?$/,"/");state.bookRoot=state.innerLanguage?url.resolve(state.root,".."):state.root};state.update($);module.exports=state},{jquery:1,path:4,url:10}],20:[function(require,module,exports){var baseKey="";module.exports={setBaseKey:function(key){baseKey=key},set:function(key,value){key=baseKey+":"+key;try{localStorage[key]=JSON.stringify(value)}catch(e){}},get:function(key,def){key=baseKey+":"+key;if(localStorage[key]===undefined)return def;try{var v=JSON.parse(localStorage[key]);return v==null?def:v}catch(err){return localStorage[key]||def}},remove:function(key){key=baseKey+":"+key;localStorage.removeItem(key)}}},{}],21:[function(require,module,exports){var $=require("jquery");var _=require("lodash");var events=require("./events");var buttons=[];function insertAt(parent,selector,index,element){var lastIndex=parent.children(selector).size();if(index<0){index=Math.max(0,lastIndex+1+index)}parent.append(element);if(index",{"class":"dropdown-menu",html:''});if(_.isString(dropdown)){$menu.append(dropdown)}else{var groups=_.map(dropdown,function(group){if(_.isArray(group))return group;else return[group]});_.each(groups,function(group){var $group=$("
      ",{"class":"buttons"});var sizeClass="size-"+group.length;_.each(group,function(btn){btn=_.defaults(btn||{},{text:"",className:"",onClick:defaultOnClick});var $btn=$("