This repository has been archived by the owner on Sep 11, 2023. It is now read-only.
forked from UBC-STAT/stat-545-guidebook
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcm010.Rmd
49 lines (30 loc) · 2.46 KB
/
cm010.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# Tibble Joins
Today's topic is on operations with two or more tibbles.
## Worksheet
You can find a worksheet template for today [here](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm010-exercise.Rmd).
## Resources
- [Jenny's join cheatsheet](https://stat545.com/join-cheatsheet.html)
- "two-table verbs"'s [vignette](https://cran.r-project.org/web/packages/dplyr/vignettes/two-table.html)
- [Relational Data chapter](https://r4ds.had.co.nz/relational-data.html) in "R for Data Science".
- [dplyr cheatsheet](https://rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf)
For an overview of operations involving multiple tibbles, check out Jenny's [Chapter 14](https://stat545.com/multiple-tibbles.html) in stat545.com.
For more activities, check out [Rashedul's guest lecture material from 2018](https://github.com/Rashedul/stat545_guest_lecture).
## Join Functions (25 min)
Often, we need to work with data living in more than one table. There are three main types of operations that can be done with two tables (as elaborated in [r4ds Chapter 13 Introduction](https://r4ds.had.co.nz/relational-data.html#introduction-7)):
- [__Mutating joins__](https://r4ds.had.co.nz/relational-data.html#mutating-joins) add new columns to the "original" tibble.
- [__Filtering joins__](https://r4ds.had.co.nz/relational-data.html#filtering-joins) filter the "original" tibble's rows.
- [__Set operations__](https://r4ds.had.co.nz/relational-data.html#set-operations) work as if each row is an element in a set.
- __Binding__ stacks tables on top of or beside each other, with `bind_rows()` and `bind_cols()`.
Let's navigate to each of these three links, which lead to the relevant r4ds chapters, and go through the concepts there. These have excellent visuals to explain what's going on.
Then, let's go through [Jenny's join cheatsheet](https://stat545.com/join-cheatsheet.html) for examples.
## Activity (25 min)
Let's complete [today's worksheet](https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/tutorials/cm010-exercise.Rmd).
In case you can't download the `singer` package, just load the data by running these two lines
```
songs <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/songs.csv")
locations <- read_csv("https://raw.githubusercontent.com/STAT545-UBC/Classroom/master/data/singer/loc.csv")
```
## Time remaining?
Let's return to the exercises from either:
- tidyr last class
- ggplot2 the class before