Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wrangling chapter review #235

Merged
merged 72 commits into from
Sep 27, 2021
Merged
Changes from 1 commit
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
5b1fc62
updating the wide/long sections as per reviewer E suggestions
leem44 Jun 27, 2021
fb7c142
adding convert argument to separate function
leem44 Jun 28, 2021
6b31e5c
updating mutate section to account for new convert in separate sectio…
leem44 Jun 28, 2021
c03342a
editing piping section to include when you might use temporary object…
leem44 Jun 28, 2021
bea4ae2
updating piping section as per Reviewer Cs suggestion
leem44 Jun 28, 2021
8c4e76c
adding why wide is bad
leem44 Jun 28, 2021
7b729c2
minor changes
leem44 Jun 28, 2021
bb989f6
changing column width in lang_long table so we can read all the rows
leem44 Jun 28, 2021
5f2d235
adding example for tibble
leem44 Jun 28, 2021
4a11419
adding summarize_if
leem44 Jun 28, 2021
31cdcdf
adding section about select helpers
leem44 Jun 29, 2021
a0f9749
added section on summarize +across
leem44 Jun 29, 2021
9b9f0c6
doing a pass through the chapter and editing grammar/spelling/logic
leem44 Jun 29, 2021
80df294
removing summarize_if since we have across
leem44 Jun 29, 2021
59dedd9
minor change
leem44 Jun 29, 2021
4a7b8a6
updating numbers in pivot longer table to match data frame
Jul 8, 2021
b94e2dc
merging with remote branch
Jul 8, 2021
fd487c5
adding image explanation for separate function
Jul 8, 2021
23a7442
moving section down to additional resources
Jul 29, 2021
c5b8533
wrapping text
Jul 29, 2021
c5dbe60
making corrections to the formatting
Jul 29, 2021
b74c232
fixing code box in wrong place
Aug 16, 2021
c74feab
updating numbers in pivot longer table to match data frame
Jul 8, 2021
48ad252
updating the wide/long sections as per reviewer E suggestions
leem44 Jun 27, 2021
c7206bb
adding convert argument to separate function
leem44 Jun 28, 2021
86e1b0d
updating mutate section to account for new convert in separate sectio…
leem44 Jun 28, 2021
3b6c87e
editing piping section to include when you might use temporary object…
leem44 Jun 28, 2021
0e969d5
updating piping section as per Reviewer Cs suggestion
leem44 Jun 28, 2021
6d5536a
adding why wide is bad
leem44 Jun 28, 2021
d7bdff8
minor changes
leem44 Jun 28, 2021
e23cc4d
changing column width in lang_long table so we can read all the rows
leem44 Jun 28, 2021
2bfdbcd
adding example for tibble
leem44 Jun 28, 2021
955d111
adding summarize_if
leem44 Jun 28, 2021
6a415d1
adding section about select helpers
leem44 Jun 29, 2021
316b1c0
added section on summarize +across
leem44 Jun 29, 2021
6f1d40d
doing a pass through the chapter and editing grammar/spelling/logic
leem44 Jun 29, 2021
d0be545
removing summarize_if since we have across
leem44 Jun 29, 2021
05da9ae
minor change
leem44 Jun 29, 2021
6547353
adding image explanation for separate function
Jul 8, 2021
ae49617
moving section down to additional resources
Jul 29, 2021
70d903c
wrapping text
Jul 29, 2021
e7f7172
making corrections to the formatting
Jul 29, 2021
81eb7e3
fixing code box in wrong place
Aug 16, 2021
16f55f7
renamed wrangling
Aug 16, 2021
54b40c5
editing map paragraph since we added summarize and across
Aug 16, 2021
84b50a4
adding section on rowwise
Aug 16, 2021
f2bdb3e
addressing reviewer Ds comments (adding table of summary wrangling fu…
Aug 16, 2021
2490eac
changing text up to 3.4.3: reordering vector, data frame, list sectio…
Aug 17, 2021
467066f
going through 3.4.3 - 3.5.1 and fixing the writing
Aug 17, 2021
2fe8478
fixing the writing for clarity, updating the explanation of pull
Aug 17, 2021
a217d0e
one last editing pass in the second half of the chapter
Aug 17, 2021
89d2ffd
split functions and operators across two bullets and alphabetized the…
ttimbers Sep 19, 2021
d3a29bc
combined row and observation figure into one, as the explanation seem…
ttimbers Sep 19, 2021
7af9dbb
swapped example to be character, not integer - because we call it an …
ttimbers Sep 19, 2021
ccb1d0e
fixed typo calling vector year that I changed to region
ttimbers Sep 19, 2021
919701c
a few more image changes to go with the character vector example
ttimbers Sep 19, 2021
0acd051
reviewed up until Tidy data
ttimbers Sep 19, 2021
2ecf1e0
changed some headers to title case to be consistent
ttimbers Sep 19, 2021
5551639
worked on tidying from wider to longer wording and removed loading da…
ttimbers Sep 20, 2021
0df1d18
fixed wrong tidy image
ttimbers Sep 22, 2021
3fbf22e
wording changes to tidy data section
ttimbers Sep 22, 2021
6858f74
wording changes up to the end of mutate
ttimbers Sep 22, 2021
a34e1ac
improved image size for fig 02-plot
ttimbers Sep 22, 2021
93d366e
simplified mutate as a new column example
ttimbers Sep 22, 2021
cd611ba
small plot changes related to simplifuing the mutate example
ttimbers Sep 22, 2021
3a651da
wording changes for the pipe section
ttimbers Sep 23, 2021
d03b1d0
edited wording in rowwise section
ttimbers Sep 23, 2021
15ec161
reorganized and simplied the summarize/purrr map/rowwise section. Sti…
ttimbers Sep 25, 2021
ec9a2af
added NA section for summarize + across
ttimbers Sep 26, 2021
2bc1b8f
Fixed images for pivoting and added images for aggregating
ttimbers Sep 26, 2021
ee93e23
tried to keep most text and all code to 80 characters
ttimbers Sep 26, 2021
5d2abc1
merging dev into wrangling
ttimbers Sep 27, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
adding why wide is bad
leem44 committed Jun 28, 2021
commit 8c4e76c1b2b72ee988aea37a0b006dc135b18c69
4 changes: 2 additions & 2 deletions 02-wrangling.Rmd
Original file line number Diff line number Diff line change
@@ -265,7 +265,7 @@ lang_long <- read_csv("data/region_lang_top5_cities_long.csv")
lang_long
```

What is wrong with this format above? In this example, each observation should be a language in a region. However, in the messy data set above, each observation is split across multiple rows. One where the count for `most_at_home` is recorded and the other where the count for `most_at_work` is recorded. Suppose we wanted to visualize the relationship between the number of Canadians reporting their primary language at home and work. It would be difficult to do that with the data in its current format.
What is wrong with this format above? In this example, each observation should be a language in a region. However, in the messy data set above, each observation is split across multiple rows. One where the count for `most_at_home` is recorded and the other where the count for `most_at_work` is recorded. Suppose we wanted to visualize the relationship between the number of Canadians reporting their primary language at home and work. It would be difficult to do that with the data in its current format since these two variables are stored in the same column.

We can see how we would like to transform the data from long to wide with `pivot_wider` in Figure \@ref(fig:img-pivot-wider-table).
```{r img-pivot-wider-table, echo = FALSE, message = FALSE, warning = FALSE, fig.cap = "Going from long to wide with pivot_wider", out.width="1100", fig.retina = 2}
@@ -781,7 +781,7 @@ useful functions, we recommend you checkout [this chapter](http://stat545.com/bl
## Using `purrr`'s `map*` functions to iterate

Where should you turn when you discover the next step in your data wrangling/cleaning process requires you to apply a function to
each column in a data frame? For example, if you wanted to know the maximum value of each column in a data frame? Well, you could use `summarize`
each column in a data frame? The process of repeating a function on different columns or different data sets is called **iteration**. For example, suppose you wanted to know the maximum value of each column in a data frame. You could use `summarize`
as discussed above. However, this becomes inconvenient when you have many columns, as `summarize` requires you to type out a column name and a data
transformation for each summary statistic you want to calculate.