Skip to content

Commit

Permalink
Merge pull request #559 from UBC-DSCI/db-create-column
Browse files Browse the repository at this point in the history
Updates to DB section to match Python
  • Loading branch information
trevorcampbell authored Nov 12, 2023
2 parents c9b277a + b4df504 commit 22baaf4
Showing 1 changed file with 20 additions and 8 deletions.
28 changes: 20 additions & 8 deletions source/reading.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -616,26 +616,38 @@ response for us. So `dbplyr` does all the hard work of translating from R to SQL
we can just stick with R!

With our `lang_db` table reference for the 2016 Canadian Census data in hand, we
can mostly continue onward as if it were a regular data frame. For example,
we can use the `filter` function
to obtain only certain rows. Below we filter the data to include only Aboriginal languages.
can mostly continue onward as if it were a regular data frame. For example, let's do the same exercise
from Chapter \@ref(intro): we will obtain only those rows corresponding to Aboriginal languages, and keep only
the `language` and `mother_tongue` columns.
We can use the `filter` function to obtain only certain rows. Below we filter the data to include only Aboriginal languages.

```{r}
aboriginal_lang_db <- filter(lang_db, category == "Aboriginal languages")
aboriginal_lang_db
```

Above you can again see the hints that this data is not actually stored in R yet:
the source is a `lazy query [?? x 6]` and the output says `... with more rows` at the end
the source is `SQL [?? x 6]` and the output says `... more rows` at the end
(both indicating that R does not know how many rows there are in total!),
and a database type `sqlite 3.36.0` is listed.
and a database type `sqlite` is listed.
We didn't use the `collect` function because we are not ready to bring the data into R yet. \index{collect}
We can still use the database to do some work to obtain *only* the small amount of data we want to work with locally
in R. Let's add the second part of our database query: selecting only the `language` and `mother_tongue` columns
using the `select` function.

```{r}
aboriginal_lang_selected_db <- select(aboriginal_lang_db, language, mother_tongue)
aboriginal_lang_selected_db
```

Now you can see that the database will return only the two columns we asked for with the `select` function.
In order to actually retrieve this data in R as a data frame,
we use the `collect` function. \index{filter}
Below you will see that after running `collect`, R knows that the retrieved
data has 67 rows, and there is no database listed any more.

```{r}
aboriginal_lang_data <- collect(aboriginal_lang_db)
aboriginal_lang_data <- collect(aboriginal_lang_selected_db)
aboriginal_lang_data
```

Expand All @@ -649,14 +661,14 @@ For example, look what happens when we try to use `nrow` to count rows
in a data frame: \index{nrow}

```{r}
nrow(aboriginal_lang_db)
nrow(aboriginal_lang_selected_db)
```

or `tail` to preview the last six rows of a data frame:
\index{tail}

```{r, eval = FALSE}
tail(aboriginal_lang_db)
tail(aboriginal_lang_selected_db)
```

```
Expand Down

0 comments on commit 22baaf4

Please sign in to comment.