diff --git a/source/reading.Rmd b/source/reading.Rmd index 11f987397..aef2f85f1 100644 --- a/source/reading.Rmd +++ b/source/reading.Rmd @@ -616,9 +616,10 @@ response for us. So `dbplyr` does all the hard work of translating from R to SQL we can just stick with R! With our `lang_db` table reference for the 2016 Canadian Census data in hand, we -can mostly continue onward as if it were a regular data frame. For example, -we can use the `filter` function -to obtain only certain rows. Below we filter the data to include only Aboriginal languages. +can mostly continue onward as if it were a regular data frame. For example, let's do the same exercise +from Chapter \@ref(intro): we will obtain only those rows corresponding to Aboriginal languages, and keep only +the `language` and `mother_tongue` columns. +We can use the `filter` function to obtain only certain rows. Below we filter the data to include only Aboriginal languages. ```{r} aboriginal_lang_db <- filter(lang_db, category == "Aboriginal languages") @@ -626,16 +627,27 @@ aboriginal_lang_db ``` Above you can again see the hints that this data is not actually stored in R yet: -the source is a `lazy query [?? x 6]` and the output says `... with more rows` at the end +the source is `SQL [?? x 6]` and the output says `... more rows` at the end (both indicating that R does not know how many rows there are in total!), -and a database type `sqlite 3.36.0` is listed. +and a database type `sqlite` is listed. +We didn't use the `collect` function because we are not ready to bring the data into R yet. \index{collect} +We can still use the database to do some work to obtain *only* the small amount of data we want to work with locally +in R. Let's add the second part of our database query: selecting only the `language` and `mother_tongue` columns +using the `select` function. + +```{r} +aboriginal_lang_selected_db <- select(aboriginal_lang_db, language, mother_tongue) +aboriginal_lang_selected_db +``` + +Now you can see that the database will return only the two columns we asked for with the `select` function. In order to actually retrieve this data in R as a data frame, we use the `collect` function. \index{filter} Below you will see that after running `collect`, R knows that the retrieved data has 67 rows, and there is no database listed any more. ```{r} -aboriginal_lang_data <- collect(aboriginal_lang_db) +aboriginal_lang_data <- collect(aboriginal_lang_selected_db) aboriginal_lang_data ``` @@ -649,14 +661,14 @@ For example, look what happens when we try to use `nrow` to count rows in a data frame: \index{nrow} ```{r} -nrow(aboriginal_lang_db) +nrow(aboriginal_lang_selected_db) ``` or `tail` to preview the last six rows of a data frame: \index{tail} ```{r, eval = FALSE} -tail(aboriginal_lang_db) +tail(aboriginal_lang_selected_db) ``` ```