Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrangling chapter review #235

Merged
merged 72 commits into from
Sep 27, 2021
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
5b1fc62
updating the wide/long sections as per reviewer E suggestions
leem44 Jun 27, 2021
fb7c142
adding convert argument to separate function
leem44 Jun 28, 2021
6b31e5c
updating mutate section to account for new convert in separate sectio…
leem44 Jun 28, 2021
c03342a
editing piping section to include when you might use temporary object…
leem44 Jun 28, 2021
bea4ae2
updating piping section as per Reviewer Cs suggestion
leem44 Jun 28, 2021
8c4e76c
adding why wide is bad
leem44 Jun 28, 2021
7b729c2
minor changes
leem44 Jun 28, 2021
bb989f6
changing column width in lang_long table so we can read all the rows
leem44 Jun 28, 2021
5f2d235
adding example for tibble
leem44 Jun 28, 2021
4a11419
adding summarize_if
leem44 Jun 28, 2021
31cdcdf
adding section about select helpers
leem44 Jun 29, 2021
a0f9749
added section on summarize +across
leem44 Jun 29, 2021
9b9f0c6
doing a pass through the chapter and editing grammar/spelling/logic
leem44 Jun 29, 2021
80df294
removing summarize_if since we have across
leem44 Jun 29, 2021
59dedd9
minor change
leem44 Jun 29, 2021
4a7b8a6
updating numbers in pivot longer table to match data frame
Jul 8, 2021
b94e2dc
merging with remote branch
Jul 8, 2021
fd487c5
adding image explanation for separate function
Jul 8, 2021
23a7442
moving section down to additional resources
Jul 29, 2021
c5b8533
wrapping text
Jul 29, 2021
c5dbe60
making corrections to the formatting
Jul 29, 2021
b74c232
fixing code box in wrong place
Aug 16, 2021
c74feab
updating numbers in pivot longer table to match data frame
Jul 8, 2021
48ad252
updating the wide/long sections as per reviewer E suggestions
leem44 Jun 27, 2021
c7206bb
adding convert argument to separate function
leem44 Jun 28, 2021
86e1b0d
updating mutate section to account for new convert in separate sectio…
leem44 Jun 28, 2021
3b6c87e
editing piping section to include when you might use temporary object…
leem44 Jun 28, 2021
0e969d5
updating piping section as per Reviewer Cs suggestion
leem44 Jun 28, 2021
6d5536a
adding why wide is bad
leem44 Jun 28, 2021
d7bdff8
minor changes
leem44 Jun 28, 2021
e23cc4d
changing column width in lang_long table so we can read all the rows
leem44 Jun 28, 2021
2bfdbcd
adding example for tibble
leem44 Jun 28, 2021
955d111
adding summarize_if
leem44 Jun 28, 2021
6a415d1
adding section about select helpers
leem44 Jun 29, 2021
316b1c0
added section on summarize +across
leem44 Jun 29, 2021
6f1d40d
doing a pass through the chapter and editing grammar/spelling/logic
leem44 Jun 29, 2021
d0be545
removing summarize_if since we have across
leem44 Jun 29, 2021
05da9ae
minor change
leem44 Jun 29, 2021
6547353
adding image explanation for separate function
Jul 8, 2021
ae49617
moving section down to additional resources
Jul 29, 2021
70d903c
wrapping text
Jul 29, 2021
e7f7172
making corrections to the formatting
Jul 29, 2021
81eb7e3
fixing code box in wrong place
Aug 16, 2021
16f55f7
renamed wrangling
Aug 16, 2021
54b40c5
editing map paragraph since we added summarize and across
Aug 16, 2021
84b50a4
adding section on rowwise
Aug 16, 2021
f2bdb3e
addressing reviewer Ds comments (adding table of summary wrangling fu…
Aug 16, 2021
2490eac
changing text up to 3.4.3: reordering vector, data frame, list sectio…
Aug 17, 2021
467066f
going through 3.4.3 - 3.5.1 and fixing the writing
Aug 17, 2021
2fe8478
fixing the writing for clarity, updating the explanation of pull
Aug 17, 2021
a217d0e
one last editing pass in the second half of the chapter
Aug 17, 2021
89d2ffd
split functions and operators across two bullets and alphabetized the…
ttimbers Sep 19, 2021
d3a29bc
combined row and observation figure into one, as the explanation seem…
ttimbers Sep 19, 2021
7af9dbb
swapped example to be character, not integer - because we call it an …
ttimbers Sep 19, 2021
ccb1d0e
fixed typo calling vector year that I changed to region
ttimbers Sep 19, 2021
919701c
a few more image changes to go with the character vector example
ttimbers Sep 19, 2021
0acd051
reviewed up until Tidy data
ttimbers Sep 19, 2021
2ecf1e0
changed some headers to title case to be consistent
ttimbers Sep 19, 2021
5551639
worked on tidying from wider to longer wording and removed loading da…
ttimbers Sep 20, 2021
0df1d18
fixed wrong tidy image
ttimbers Sep 22, 2021
3fbf22e
wording changes to tidy data section
ttimbers Sep 22, 2021
6858f74
wording changes up to the end of mutate
ttimbers Sep 22, 2021
a34e1ac
improved image size for fig 02-plot
ttimbers Sep 22, 2021
93d366e
simplified mutate as a new column example
ttimbers Sep 22, 2021
cd611ba
small plot changes related to simplifuing the mutate example
ttimbers Sep 22, 2021
3a651da
wording changes for the pipe section
ttimbers Sep 23, 2021
d03b1d0
edited wording in rowwise section
ttimbers Sep 23, 2021
15ec161
reorganized and simplied the summarize/purrr map/rowwise section. Sti…
ttimbers Sep 25, 2021
ec9a2af
added NA section for summarize + across
ttimbers Sep 26, 2021
2bc1b8f
Fixed images for pivoting and added images for aggregating
ttimbers Sep 26, 2021
ee93e23
tried to keep most text and all code to 80 characters
ttimbers Sep 26, 2021
5d2abc1
merging dev into wrangling
ttimbers Sep 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
wording changes for the pipe section
ttimbers committed Sep 23, 2021
commit 3a651daa3bd84e1e5bbcf4f377acd31f41f752bf
59 changes: 31 additions & 28 deletions wrangling.Rmd
Original file line number Diff line number Diff line change
@@ -770,13 +770,6 @@ than French in Montréal according to the 2016 Canadian census.

## Using `mutate` to modify or add columns

``` {r 02-character-official-langs, echo = FALSE}
official_langs_chr <- mutate(official_langs,
most_at_home = as.character(most_at_home),
most_at_work = as.character(most_at_work)
)
```

In section \@ref(separate),
when we first read in the `"region_lang_top5_cities_messy.csv"` data
all of the variables were "character" data types.
@@ -1004,13 +997,6 @@ output <- data |>
select(new_col)
```

> Note: The `|>` pipe operator was inspired by a previous version of the pipe
> operator, `%>%`. The `%>%` pipe operator is not built into R and needs to be
> loaded via an external R package. There are some other drawbacks to using
> `%>%`, which are beyond the scope of this textbook. Just be aware that `%>%` exists
> since you may see it used in some books or other sources. However, in this
> textbook, we will be using the base R pipe operator syntax, `|>`.

You can think of the pipe as a physical pipe. It takes the output from the
function on the left-hand side of the pipe, and passes it as the first argument
to the function on the right-hand side of the pipe. Note here that we have again
@@ -1023,6 +1009,20 @@ again is used to aid in improving code readability.
Next, let's learn about the details of using the pipe, and look at some examples
of how to use it in data analysis.

> Note: In this textbook, we will be using the base R pipe operator syntax, `|>`.
> This base R `|>` pipe operator was inspired by a previous version of the pipe
> operator, `%>%`. The `%>%` pipe operator is not built into R
> and is from the `magrittr` R package.
> The `tidyverse` metapackage imports the `%>%` pipe operator via `dplyr`
> (which in turn imports the `magrittr` R package).
> In more advanced R use related to sharing and distributing code as R packages,
> there are some other drawbacks to using `%>%` compared to `|>`,
> however these are beyond the scope of this textbook.
> We have this note in the book to make the reader aware that `%>%` exists
> as it still commonly used in data analysis code and in many data science
> books and other resources.
> In most cases these two pipes are interchangeable and either can be used.

### Using `|>` to combine `filter` and `select`

Let's work with our tidy `tidy_lang` data set from section \@ref(separate), which contains
@@ -1037,11 +1037,15 @@ Suppose we want to create a subset of the data with only the languages and
counts of each language spoken most at home for the city of Vancouver. To do
this, we can use the functions `filter` and `select`. First, we use `filter` to
create a data frame called `van_data` that contains only values for Vancouver.
We then use `select` on this data frame to keep only the variables we want:

``` {r}
van_data <- filter(tidy_lang, region == "Vancouver")
van_data
```

We then use `select` on this data frame to keep only the variables we want:

``` {r}
van_data_selected <- select(van_data, language, most_at_home)
van_data_selected
```
@@ -1069,19 +1073,18 @@ approach is more clear and readable.

### Using `|>` with more than two functions

The `|>` can be used with any function in R. Additionally, we can pipe together
The pipe operator (|>) can be used with any function in R. Additionally, we can pipe together
more than two functions. For example, we can pipe together three functions to:

- order the rows by counts of the language most spoken at home from smallest to largest
- only include the counts that are more than 10,000 and
- only include the region, language and count of Canadians reporting their primary language at home columns in our table.
- `filter` rows to include only those where the counts of the language most spoken at home are greater than 10,000,
- `select` only the columns corresponding to `region`, `language` and `most_at_home`, and
- `arrange` the dataframe rows in order by counts of the language most spoken at home
from smallest to largest.

To order by counts of the language most spoken at home we will use the
`tidyverse` function, `arrange`. Here we use only one column for sorting (`most_at_home`),
but more than one can also be used. To do this, list additional columns
separated by commas. The order they are listed in indicates the order in which
they will be used for sorting. This is much like how an English dictionary sorts
words: first by the first letter, then by the second letter, and so on.
> **Note:** As we saw in Chapter \@ref(intro), we can use the `tidyverse` `arrange` function
> to order the rows in the dataframe by the values of one or more columns.
> Here we pass the column name `most_at_home` to arrange to order the dataframe rows
> by the values in that column, in ascending order.

``` {r}
large_region_lang <- filter(tidy_lang, most_at_home > 10000) |>
@@ -1116,9 +1119,9 @@ still want to do these things. For example, you might store temporary objects
because you want to save prepared data before feeding it into a plot function
so you can iteratively change the plot without having to
redo all of your data transformations each time you create a new plot.
Additionally, piping many functions can be overwhelming, so you may also want to
store a temporary object midway through and pipe that into more functions after
that.
Additionally, piping many functions can be overwhelming and difficult to debug,
so you may also want to store a temporary object midway through
and pipe that into more functions after that.

## Aggregating data with `group_by` + `summarize`