From 9554c61787dafbd152bad2d33297d1391a208533 Mon Sep 17 00:00:00 2001 From: brymz Date: Thu, 8 Dec 2016 09:44:43 -0600 Subject: [PATCH 1/5] introduction.md: add prereq sect --- _episodes/01-introduction.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index e0267ec..e0389e5 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -25,7 +25,9 @@ keypoints: and graphical representation of the results." --- -**Welcome to the Software Carpentry lesson on Data Visualization for novices** +## Welcome to the Software Carpentry lesson on Data Visualization for novices + +### Setup Make sure you have followed the [setup instructions][setup] before moving onto the next episodes. @@ -40,5 +42,9 @@ install.packages("ggplot2") ~~~ {: .r} +### Prerequisites + +This lesson is designed for novice programmers that are already familiar with basic data management using basic R function and the `dplyr` package. + [setup]: {{ site.baseurl }}/setup/ From d5e15bb691f48549ae0d082ab12cb2110e410b60 Mon Sep 17 00:00:00 2001 From: brymz Date: Thu, 8 Dec 2016 12:46:53 -0600 Subject: [PATCH 2/5] 01-introduction: draft episode content --- _episodes/01-introduction.md | 32 ++++++++++++++++++++++++++++++-- 1 file changed, 30 insertions(+), 2 deletions(-) diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md index e0389e5..561dbe1 100644 --- a/_episodes/01-introduction.md +++ b/_episodes/01-introduction.md @@ -44,7 +44,35 @@ install.packages("ggplot2") ### Prerequisites -This lesson is designed for novice programmers that are already familiar with basic data management using basic R function and the `dplyr` package. +This lesson is designed for novice programmers that already have some practice +with basic data management using base R function and the `dplyr` package. The +following lessons, [Data Management in R][data-management] and +[`dplyr` Basics][dplyr], provide a refresher for the prerequisites and an +opportunity to prepare the [data][data] for the lesson examples. -[setup]: {{ site.baseurl }}/setup/ +### Audience + +Have you ever seen a figure that tried to convey way too much? Or a figure that just didn't seem to give you any useful information? + +This lesson is for scientists that are getting started using R to analyze their +data and want to know more about data visualization using `ggplot2` and want to +practice choosing the best figure to present their data and data analysis. The +lesson should prepare you to choose and generate the best publication quality chart to answer your research question. + +### Goals +This lesson will walk you through the steps to generate a publication quality +chart that directly addresses a research question, including the steps to: + + - interactively generate analysis code and save it in a file + - read tabular data into a data frame + - manipulate and summarize tabular data using `dplyr` + - generate publication quality charts using `ggplot2` + +Interactive participation in the lesson should also give you a start critiquing +others' choices of charts and esthetics and expressing how to improve them. + +[setup]: {{ site.baseurl }}/setup/ +[data-management]: {{ site.baseurl }}/02-data-management/ +[dplyr]: {{ site.baseurl }}/04-dplyr/ +[data]: {{ site.baseurl }}/data/gapminder_all.csv From 33eefa525fdbd5780a2949bce19b9294a7450e47 Mon Sep 17 00:00:00 2001 From: brymz Date: Thu, 8 Dec 2016 13:40:44 -0600 Subject: [PATCH 3/5] 02-data-management.md: draft episode content --- _episodes/02-data-management.md | 213 ++++++++++++++++++-------------- 1 file changed, 122 insertions(+), 91 deletions(-) diff --git a/_episodes/02-data-management.md b/_episodes/02-data-management.md index 6e189fc..2b53abd 100644 --- a/_episodes/02-data-management.md +++ b/_episodes/02-data-management.md @@ -9,119 +9,148 @@ objectives: - "To read Gapminder data to R" - "To evaluate Gapminder data structure" keypoints: -- "Be sure to `setwd()` to point to your data file before importing it." +- "Be sure to setup an RStudio project or `setwd()` to point to your data file + before importing it." - "Import data using `read.csv()`." - "Familiarize yourself with your data and its structure prior to analysis." --- +## Import your data. + +Be sure that the [data][data] is downloaded to a project folder that is connected to your RStudio project or your working directory of your R console. + +Import the data using `read.csv()`. + ~~~ data <- read.csv("gapminder_all.csv") -head(data) ~~~ {: .r} -~~~ - continent country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 -1 Africa Algeria 2449.0082 3013.9760 2550.8169 3246.9918 4182.6638 -2 Africa Angola 3520.6103 3827.9405 4269.2767 5522.7764 5473.2880 -3 Africa Benin 1062.7522 959.6011 949.4991 1035.8314 1085.7969 -4 Africa Botswana 851.2411 918.2325 983.6540 1214.7093 2263.6111 -5 Africa Burkina Faso 543.2552 617.1835 722.5120 794.8266 854.7360 -6 Africa Burundi 339.2965 379.5646 355.2032 412.9775 464.0995 - gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007 -1 4910.4168 5745.1602 5681.3585 5023.2166 4797.2951 5288.0404 6223.3675 -2 3008.6474 2756.9537 2430.2083 2627.8457 2277.1409 2773.2873 4797.2313 -3 1029.1613 1277.8976 1225.8560 1191.2077 1232.9753 1372.8779 1441.2849 -4 3214.8578 4551.1421 6205.8839 7954.1116 8647.1423 11003.6051 12569.8518 -5 743.3870 807.1986 912.0631 931.7528 946.2950 1037.6452 1217.0330 -6 556.1033 559.6032 621.8188 631.6999 463.1151 446.4035 430.0707 - lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987 -1 43.077 45.685 48.303 51.407 54.518 58.014 61.368 65.799 -2 30.015 31.999 34.000 35.985 37.928 39.483 39.942 39.906 -3 38.223 40.358 42.618 44.885 47.014 49.190 50.904 52.337 -4 47.622 49.618 51.520 53.298 56.024 59.319 61.484 63.622 -5 31.975 34.906 37.814 40.697 43.591 46.137 48.122 49.557 -6 39.031 40.533 42.045 43.548 44.057 45.910 47.471 48.211 - lifeExp_1992 lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962 pop_1967 pop_1972 pop_1977 -1 67.744 69.152 70.994 72.301 9279525 10270856 11000948 12760499 14760787 17152804 -2 40.647 40.963 41.003 42.731 4232095 4561361 4826015 5247469 5894858 6162675 -3 53.919 54.777 54.406 56.728 1738315 1925173 2151895 2427334 2761407 3168267 -4 62.745 52.556 46.634 50.728 442308 474639 512764 553541 619351 781472 -5 50.260 50.324 50.650 52.295 4469979 4713416 4919632 5127935 5433886 5889574 -6 44.736 45.326 47.360 49.580 2445618 2667518 2961915 3330989 3529983 3834415 - pop_1982 pop_1987 pop_1992 pop_1997 pop_2002 pop_2007 -1 20033753 23254956 26298373 29072015 31287142 33333216 -2 7016384 7874230 8735988 9875024 10866106 12420476 -3 3641603 4243788 4981671 6066080 7026113 8078314 -4 970347 1151184 1342614 1536536 1630347 1639131 -5 6634596 7586551 8878303 10352843 12251209 14326203 -6 4580410 5126023 5809236 6121610 7021078 8390505 -~~~ -{: .output} +## Learn the structure of your data. + +Some base R functions are useful to check out the structure of your data. + +- `head()` shows all of the columns (or variables) with just the first six rows, + but it is not great for data with a lot of columns. Like this: ~~~ -srt(data) +head(data) ~~~ {: .r} +> ## output +> ~~~ +> continent country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972 +> 1 Africa Algeria 2449.0082 3013.9760 2550.8169 3246.9918 4182.6638 +> 2 Africa Angola 3520.6103 3827.9405 4269.2767 5522.7764 5473.2880 +> 3 Africa Benin 1062.7522 959.6011 949.4991 1035.8314 1085.7969 +> 4 Africa Botswana 851.2411 918.2325 983.6540 1214.7093 2263.6111 +> 5 Africa Burkina Faso 543.2552 617.1835 722.5120 794.8266 854.7360 +> 6 Africa Burundi 339.2965 379.5646 355.2032 412.9775 464.0995 +> gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007 +> 1 4910.4168 5745.1602 5681.3585 5023.2166 4797.2951 5288.0404 6223.3675 +> 2 3008.6474 2756.9537 2430.2083 2627.8457 2277.1409 2773.2873 4797.2313 +> 3 1029.1613 1277.8976 1225.8560 1191.2077 1232.9753 1372.8779 1441.2849 +> 4 3214.8578 4551.1421 6205.8839 7954.1116 8647.1423 11003.6051 12569.8518 +> 5 743.3870 807.1986 912.0631 931.7528 946.2950 1037.6452 1217.0330 +> 6 556.1033 559.6032 621.8188 631.6999 463.1151 446.4035 430.0707 +> lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987 +> 1 43.077 45.685 48.303 51.407 54.518 58.014 61.368 65.799 +> 2 30.015 31.999 34.000 35.985 37.928 39.483 39.942 39.906 +> 3 38.223 40.358 42.618 44.885 47.014 49.190 50.904 52.337 +> 4 47.622 49.618 51.520 53.298 56.024 59.319 61.484 63.622 +> 5 31.975 34.906 37.814 40.697 43.591 46.137 48.122 49.557 +> 6 39.031 40.533 42.045 43.548 44.057 45.910 47.471 48.211 +> lifeExp_1992 lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962 pop_1967 pop_1972 pop_1977 +> 1 67.744 69.152 70.994 72.301 9279525 10270856 11000948 12760499 14760787 17152804 +> 2 40.647 40.963 41.003 42.731 4232095 4561361 4826015 5247469 5894858 6162675 +> 3 53.919 54.777 54.406 56.728 1738315 1925173 2151895 2427334 2761407 3168267 +> 4 62.745 52.556 46.634 50.728 442308 474639 512764 553541 619351 781472 +> 5 50.260 50.324 50.650 52.295 4469979 4713416 4919632 5127935 5433886 5889574 +> 6 44.736 45.326 47.360 49.580 2445618 2667518 2961915 3330989 3529983 3834415 +> pop_1982 pop_1987 pop_1992 pop_1997 pop_2002 pop_2007 +> 1 20033753 23254956 26298373 29072015 31287142 33333216 +> 2 7016384 7874230 8735988 9875024 10866106 12420476 +> 3 3641603 4243788 4981671 6066080 7026113 8078314 +> 4 970347 1151184 1342614 1536536 1630347 1639131 +> 5 6634596 7586551 8878303 10352843 12251209 14326203 +> 6 4580410 5126023 5809236 6121610 7021078 8390505 +> ~~~ +> {: .output} +{: .solution} + +- `str()` gives the "shape" of the data by observations and variables and lists + all of the variables and their `class()`. + ~~~ -'data.frame': 142 obs. of 38 variables: - $ continent : Factor w/ 5 levels "Africa","Americas",..: 1 1 1 1 1 1 1 1 1 1 ... - $ country : Factor w/ 142 levels "Afghanistan",..: 3 4 11 14 17 18 20 22 23 27 ... - $ gdpPercap_1952: num 2449 3521 1063 851 543 ... - $ gdpPercap_1957: num 3014 3828 960 918 617 ... - $ gdpPercap_1962: num 2551 4269 949 984 723 ... - $ gdpPercap_1967: num 3247 5523 1036 1215 795 ... - $ gdpPercap_1972: num 4183 5473 1086 2264 855 ... - $ gdpPercap_1977: num 4910 3009 1029 3215 743 ... - $ gdpPercap_1982: num 5745 2757 1278 4551 807 ... - $ gdpPercap_1987: num 5681 2430 1226 6206 912 ... - $ gdpPercap_1992: num 5023 2628 1191 7954 932 ... - $ gdpPercap_1997: num 4797 2277 1233 8647 946 ... - $ gdpPercap_2002: num 5288 2773 1373 11004 1038 ... - $ gdpPercap_2007: num 6223 4797 1441 12570 1217 ... - $ lifeExp_1952 : num 43.1 30 38.2 47.6 32 ... - $ lifeExp_1957 : num 45.7 32 40.4 49.6 34.9 ... - $ lifeExp_1962 : num 48.3 34 42.6 51.5 37.8 ... - $ lifeExp_1967 : num 51.4 36 44.9 53.3 40.7 ... - $ lifeExp_1972 : num 54.5 37.9 47 56 43.6 ... - $ lifeExp_1977 : num 58 39.5 49.2 59.3 46.1 ... - $ lifeExp_1982 : num 61.4 39.9 50.9 61.5 48.1 ... - $ lifeExp_1987 : num 65.8 39.9 52.3 63.6 49.6 ... - $ lifeExp_1992 : num 67.7 40.6 53.9 62.7 50.3 ... - $ lifeExp_1997 : num 69.2 41 54.8 52.6 50.3 ... - $ lifeExp_2002 : num 71 41 54.4 46.6 50.6 ... - $ lifeExp_2007 : num 72.3 42.7 56.7 50.7 52.3 ... - $ pop_1952 : num 9279525 4232095 1738315 442308 4469979 ... - $ pop_1957 : num 10270856 4561361 1925173 474639 4713416 ... - $ pop_1962 : num 11000948 4826015 2151895 512764 4919632 ... - $ pop_1967 : num 12760499 5247469 2427334 553541 5127935 ... - $ pop_1972 : num 14760787 5894858 2761407 619351 5433886 ... - $ pop_1977 : num 17152804 6162675 3168267 781472 5889574 ... - $ pop_1982 : num 20033753 7016384 3641603 970347 6634596 ... - $ pop_1987 : num 23254956 7874230 4243788 1151184 7586551 ... - $ pop_1992 : num 26298373 8735988 4981671 1342614 8878303 ... - $ pop_1997 : num 29072015 9875024 6066080 1536536 10352843 ... - $ pop_2002 : int 31287142 10866106 7026113 1630347 12251209 7021078 15929988 4048013 8835739 614382 ... - $ pop_2007 : int 33333216 12420476 8078314 1639131 14326203 8390505 17696293 4369038 10238807 710960 ... +str(data) ~~~ -{: .output} +{: .r} + +> ## output +> ~~~ +> 'data.frame': 142 obs. of 38 variables: +> $ continent : Factor w/ 5 levels "Africa","Americas",..: 1 1 1 1 1 1 1 1 1 1 ... +> $ country : Factor w/ 142 levels "Afghanistan",..: 3 4 11 14 17 18 20 22 23 27 ... +> $ gdpPercap_1952: num 2449 3521 1063 851 543 ... +> $ gdpPercap_1957: num 3014 3828 960 918 617 ... +> $ gdpPercap_1962: num 2551 4269 949 984 723 ... +> $ gdpPercap_1967: num 3247 5523 1036 1215 795 ... +> $ gdpPercap_1972: num 4183 5473 1086 2264 855 ... +> $ gdpPercap_1977: num 4910 3009 1029 3215 743 ... +> $ gdpPercap_1982: num 5745 2757 1278 4551 807 ... +> $ gdpPercap_1987: num 5681 2430 1226 6206 912 ... +> $ gdpPercap_1992: num 5023 2628 1191 7954 932 ... +> $ gdpPercap_1997: num 4797 2277 1233 8647 946 ... +> $ gdpPercap_2002: num 5288 2773 1373 11004 1038 ... +> $ gdpPercap_2007: num 6223 4797 1441 12570 1217 ... +> $ lifeExp_1952 : num 43.1 30 38.2 47.6 32 ... +> $ lifeExp_1957 : num 45.7 32 40.4 49.6 34.9 ... +> $ lifeExp_1962 : num 48.3 34 42.6 51.5 37.8 ... +> $ lifeExp_1967 : num 51.4 36 44.9 53.3 40.7 ... +> $ lifeExp_1972 : num 54.5 37.9 47 56 43.6 ... +> $ lifeExp_1977 : num 58 39.5 49.2 59.3 46.1 ... +> $ lifeExp_1982 : num 61.4 39.9 50.9 61.5 48.1 ... +> $ lifeExp_1987 : num 65.8 39.9 52.3 63.6 49.6 ... +> $ lifeExp_1992 : num 67.7 40.6 53.9 62.7 50.3 ... +> $ lifeExp_1997 : num 69.2 41 54.8 52.6 50.3 ... +> $ lifeExp_2002 : num 71 41 54.4 46.6 50.6 ... +> $ lifeExp_2007 : num 72.3 42.7 56.7 50.7 52.3 ... +> $ pop_1952 : num 9279525 4232095 1738315 442308 4469979 ... +> $ pop_1957 : num 10270856 4561361 1925173 474639 4713416 ... +> $ pop_1962 : num 11000948 4826015 2151895 512764 4919632 ... +> $ pop_1967 : num 12760499 5247469 2427334 553541 5127935 ... +> $ pop_1972 : num 14760787 5894858 2761407 619351 5433886 ... +> $ pop_1977 : num 17152804 6162675 3168267 781472 5889574 ... +> $ pop_1982 : num 20033753 7016384 3641603 970347 6634596 ... +> $ pop_1987 : num 23254956 7874230 4243788 1151184 7586551 ... +> $ pop_1992 : num 26298373 8735988 4981671 1342614 8878303 ... +> $ pop_1997 : num 29072015 9875024 6066080 1536536 10352843 ... +> $ pop_2002 : int 31287142 10866106 7026113 1630347 12251209 7021078 15929988 4048013 8835739 614382 ... +> $ pop_2007 : int 33333216 12420476 8078314 1639131 14326203 8390505 17696293 4369038 10238807 710960 ... +> ~~~ +> {: .output} +{: .solution} + +- `names()` generates a vector of all the variables in the data. ~~~ names(data) ~~~ {: .r} -~~~ - [1] "continent" "country" "gdpPercap_1952" "gdpPercap_1957" "gdpPercap_1962" "gdpPercap_1967" - [7] "gdpPercap_1972" "gdpPercap_1977" "gdpPercap_1982" "gdpPercap_1987" "gdpPercap_1992" "gdpPercap_1997" -[13] "gdpPercap_2002" "gdpPercap_2007" "lifeExp_1952" "lifeExp_1957" "lifeExp_1962" "lifeExp_1967" -[19] "lifeExp_1972" "lifeExp_1977" "lifeExp_1982" "lifeExp_1987" "lifeExp_1992" "lifeExp_1997" -[25] "lifeExp_2002" "lifeExp_2007" "pop_1952" "pop_1957" "pop_1962" "pop_1967" -[31] "pop_1972" "pop_1977" "pop_1982" "pop_1987" "pop_1992" "pop_1997" -[37] "pop_2002" "pop_2007" -~~~ -{: .output} +> ## output +> ~~~ +> [1] "continent" "country" "gdpPercap_1952" "gdpPercap_1957" "gdpPercap_1962" "gdpPercap_1967" +> [7] "gdpPercap_1972" "gdpPercap_1977" "gdpPercap_1982" "gdpPercap_1987" "gdpPercap_1992" "gdpPercap_1997" +> [13] "gdpPercap_2002" "gdpPercap_2007" "lifeExp_1952" "lifeExp_1957" "lifeExp_1962" "lifeExp_1967" +> [19] "lifeExp_1972" "lifeExp_1977" "lifeExp_1982" "lifeExp_1987" "lifeExp_1992" "lifeExp_1997" +> [25] "lifeExp_2002" "lifeExp_2007" "pop_1952" "pop_1957" "pop_1962" "pop_1967" +> [31] "pop_1972" "pop_1977" "pop_1982" "pop_1987" "pop_1992" "pop_1997" +> [37] "pop_2002" "pop_2007" +> ~~~ +> {: .output} +{: .solution} > ## Data Structures Challenge > @@ -149,3 +178,5 @@ names(data) > > B. 38 > {: .solution} {: .challenge} + +[data]: {{ site.baseurl }}/data/gapminder_all.csv \ No newline at end of file From d33e9bca57af2b5fcd954b6cc8c4ee1c82bcc989 Mon Sep 17 00:00:00 2001 From: brymz Date: Thu, 8 Dec 2016 14:48:45 -0600 Subject: [PATCH 4/5] 03-data-structures.md: draft episode content --- _episodes/03-data-structures.md | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/_episodes/03-data-structures.md b/_episodes/03-data-structures.md index cb484cd..5c43690 100644 --- a/_episodes/03-data-structures.md +++ b/_episodes/03-data-structures.md @@ -23,6 +23,23 @@ keypoints: categories." --- +## Data structure is the shape and content of your data. + +The shape of your data ought to be rectangular and can be described by the number of rows and columns. + +Column in your data can also be called 'variables' or 'fields'. However, the general use of the term 'variable' to describe a column can also be used more specifically to describe numerically continuous data. As a complement to this specific use of 'variables', 'categories' describe discrete or categorical data that can be organized in groups. + +'Values' make up the rows of the data and are more specifically associated with +cells in a data table. 'Values' can represent numerical or categorical data. +Numerical values should be recognized as 'absolute' or 'relative'. 'Absolute' +values receive context by their units, while 'relative' values are standardized +in some fashion (i.e., proportion, per unit) and most often used for comparison +among categories. + +Rows of data may represent observational or experimental 'replicates'. Most +generally, 'replicates' are values with similar variables and categories that +are evaluated in data analysis. + > ## Data Organization Challenge > > What years are represented in the Gapminder data? @@ -42,3 +59,13 @@ keypoints: > > A. `country`, because it is a 'categorical' variable > {: .solution} {: .challenge} + +## Tidy data follows a set of rules. + +These rules keep data well organized and ready for analysis. + +1. Order doesn’t matter +2. No duplicate rows +3. Every cell contains one value +4. One column per type of information +5. No redundant information From ad601da1c46ce3f87e566a3088adf187d812df36 Mon Sep 17 00:00:00 2001 From: brymz Date: Thu, 8 Dec 2016 16:02:57 -0600 Subject: [PATCH 5/5] 04-dplyr.md: draft episode content --- _episodes/04-dplyr.md | 62 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 62 insertions(+) diff --git a/_episodes/04-dplyr.md b/_episodes/04-dplyr.md index ae2f3b7..cce650c 100644 --- a/_episodes/04-dplyr.md +++ b/_episodes/04-dplyr.md @@ -12,11 +12,71 @@ keypoints: - "Use `dplyr` to manipulate, summarize, and analyze your data." --- +## A *realy* quick refresher on `dplyr` + +`dplyr` is a package that needs to be installed and initiated in your R console. +You should have already installed it using `install.packages("dplyr")`. + ~~~ library(dplyr) ~~~ {: .r} +For this lesson we will primarily use the basic set of `dplyr` functions +(`select()`, `filter()`, `group_by()`, and `summarize()`) to manipulate the data +to the form required for our analysis and visualization. A more in-depth +introduction can be found [here][dplyr-vingette]. + +Each of the `dplyr` functions takes a `data.frame` (or `tibble`) as its first +argument. + +`select()` takes a list of column names and returns a `tibble` with those columns. + +~~~ +select(data, country, pop_1952) +~~~ +{: .r} + +`filter()` takes a conditional statement and returns a `tibble` that is a subset of the `data`. + +~~~ +filter(data, pop_1952 < 500000) +~~~ +{: .r} + +Often, the `tibble` is a result of another `dplyr` function. So, +functions can be nested or piped (%>%) to link the data manipulation steps. + +~~~ +select(filter(data, pop_1952 < 500000), country, pop_1952) +~~~ +{: .r} + +~~~ +data %>% + filter(pop_1952 < 500000) %>% + select(country, pop_1952) +~~~ +{: .r} + +`summarize()` uses standard math functions (e.g., `min()`, `max()`, `sum()`, +`mean()`) to generate new data values. + +~~~ +summarize(data, count = n(), min_pop1952 = min(pop_1952)) +~~~ +{: .r} + +`group_by()` takes a list of categorical columns and is used to initiate grouping the `summarize()` calculations. + +~~~ +data %>% + group_by(continent) %>% + summarize(sum_pop1952 = sum(pop_1952), + avg_pop1952 = mean(pop_1952)) +~~~ +{: .r} + > ## dplyr Challenge > > How many countries are in Africa? @@ -68,3 +128,5 @@ library(dplyr) > > {: .r} > {: .solution} {: .challenge} + +[dplyr-vingette]: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html \ No newline at end of file