From 9554c61787dafbd152bad2d33297d1391a208533 Mon Sep 17 00:00:00 2001
From: brymz <z.t.brym@gmail.com>
Date: Thu, 8 Dec 2016 09:44:43 -0600
Subject: [PATCH 1/5] introduction.md: add prereq sect

---
 _episodes/01-introduction.md | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md
index e0267ec..e0389e5 100644
--- a/_episodes/01-introduction.md
+++ b/_episodes/01-introduction.md
@@ -25,7 +25,9 @@ keypoints:
   and graphical representation of the results."
 ---
 
-**Welcome to the Software Carpentry lesson on Data Visualization for novices**
+## Welcome to the Software Carpentry lesson on Data Visualization for novices
+
+### Setup
 
 Make sure you have followed the [setup instructions][setup] before moving onto
 the next episodes.
@@ -40,5 +42,9 @@ install.packages("ggplot2")
 ~~~
 {: .r}
 
+### Prerequisites
+
+This lesson is designed for novice programmers that are already familiar with basic data management using basic R function and the `dplyr` package.
+
 [setup]: {{ site.baseurl }}/setup/
 

From d5e15bb691f48549ae0d082ab12cb2110e410b60 Mon Sep 17 00:00:00 2001
From: brymz <z.t.brym@gmail.com>
Date: Thu, 8 Dec 2016 12:46:53 -0600
Subject: [PATCH 2/5] 01-introduction: draft episode content

---
 _episodes/01-introduction.md | 32 ++++++++++++++++++++++++++++++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/_episodes/01-introduction.md b/_episodes/01-introduction.md
index e0389e5..561dbe1 100644
--- a/_episodes/01-introduction.md
+++ b/_episodes/01-introduction.md
@@ -44,7 +44,35 @@ install.packages("ggplot2")
 
 ### Prerequisites
 
-This lesson is designed for novice programmers that are already familiar with basic data management using basic R function and the `dplyr` package.
+This lesson is designed for novice programmers that already have some practice
+with basic data management using base R function and the `dplyr` package. The
+following lessons, [Data Management in R][data-management] and 
+[`dplyr` Basics][dplyr], provide a refresher for the prerequisites and an 
+opportunity to prepare the [data][data] for the lesson examples.
 
-[setup]: {{ site.baseurl }}/setup/
+### Audience
+
+Have you ever seen a figure that tried to convey way too much? Or a figure that just didn't seem to give you any useful information?
+
+This lesson is for scientists that are getting started using R to analyze their
+data and want to know more about data visualization using `ggplot2` and want to
+practice choosing the best figure to present their data and data analysis. The
+lesson should prepare you to choose and generate the best publication quality chart to answer your research question.
+
+### Goals
 
+This lesson will walk you through the steps to generate a publication quality
+chart that directly addresses a research question, including the steps to:
+
+   - interactively generate analysis code and save it in a file
+   - read tabular data into a data frame
+   - manipulate and summarize tabular data using `dplyr`
+   - generate publication quality charts using `ggplot2`
+
+Interactive participation in the lesson should also give you a start critiquing 
+others' choices of charts and esthetics and expressing how to improve them.
+
+[setup]: {{ site.baseurl }}/setup/
+[data-management]: {{ site.baseurl }}/02-data-management/
+[dplyr]: {{ site.baseurl }}/04-dplyr/
+[data]: {{ site.baseurl }}/data/gapminder_all.csv

From 33eefa525fdbd5780a2949bce19b9294a7450e47 Mon Sep 17 00:00:00 2001
From: brymz <z.t.brym@gmail.com>
Date: Thu, 8 Dec 2016 13:40:44 -0600
Subject: [PATCH 3/5] 02-data-management.md: draft episode content

---
 _episodes/02-data-management.md | 213 ++++++++++++++++++--------------
 1 file changed, 122 insertions(+), 91 deletions(-)

diff --git a/_episodes/02-data-management.md b/_episodes/02-data-management.md
index 6e189fc..2b53abd 100644
--- a/_episodes/02-data-management.md
+++ b/_episodes/02-data-management.md
@@ -9,119 +9,148 @@ objectives:
 - "To read Gapminder data to R"
 - "To evaluate Gapminder data structure"
 keypoints:
-- "Be sure to `setwd()` to point to your data file before importing it."
+- "Be sure to setup an RStudio project or `setwd()` to point to your data file
+  before importing it."
 - "Import data using `read.csv()`."
 - "Familiarize yourself with your data and its structure prior to analysis."
 ---
 
+## Import your data.
+
+Be sure that the [data][data] is downloaded to a project folder that is connected to your RStudio project or your working directory of your R console.
+
+Import the data using `read.csv()`.
+ 
 ~~~
 data <- read.csv("gapminder_all.csv")
-head(data)
 ~~~
 {: .r}
 
-~~~
-  continent      country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
-1    Africa      Algeria      2449.0082      3013.9760      2550.8169      3246.9918      4182.6638
-2    Africa       Angola      3520.6103      3827.9405      4269.2767      5522.7764      5473.2880
-3    Africa        Benin      1062.7522       959.6011       949.4991      1035.8314      1085.7969
-4    Africa     Botswana       851.2411       918.2325       983.6540      1214.7093      2263.6111
-5    Africa Burkina Faso       543.2552       617.1835       722.5120       794.8266       854.7360
-6    Africa      Burundi       339.2965       379.5646       355.2032       412.9775       464.0995
-  gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
-1      4910.4168      5745.1602      5681.3585      5023.2166      4797.2951      5288.0404      6223.3675
-2      3008.6474      2756.9537      2430.2083      2627.8457      2277.1409      2773.2873      4797.2313
-3      1029.1613      1277.8976      1225.8560      1191.2077      1232.9753      1372.8779      1441.2849
-4      3214.8578      4551.1421      6205.8839      7954.1116      8647.1423     11003.6051     12569.8518
-5       743.3870       807.1986       912.0631       931.7528       946.2950      1037.6452      1217.0330
-6       556.1033       559.6032       621.8188       631.6999       463.1151       446.4035       430.0707
-  lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987
-1       43.077       45.685       48.303       51.407       54.518       58.014       61.368       65.799
-2       30.015       31.999       34.000       35.985       37.928       39.483       39.942       39.906
-3       38.223       40.358       42.618       44.885       47.014       49.190       50.904       52.337
-4       47.622       49.618       51.520       53.298       56.024       59.319       61.484       63.622
-5       31.975       34.906       37.814       40.697       43.591       46.137       48.122       49.557
-6       39.031       40.533       42.045       43.548       44.057       45.910       47.471       48.211
-  lifeExp_1992 lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962 pop_1967 pop_1972 pop_1977
-1       67.744       69.152       70.994       72.301  9279525 10270856 11000948 12760499 14760787 17152804
-2       40.647       40.963       41.003       42.731  4232095  4561361  4826015  5247469  5894858  6162675
-3       53.919       54.777       54.406       56.728  1738315  1925173  2151895  2427334  2761407  3168267
-4       62.745       52.556       46.634       50.728   442308   474639   512764   553541   619351   781472
-5       50.260       50.324       50.650       52.295  4469979  4713416  4919632  5127935  5433886  5889574
-6       44.736       45.326       47.360       49.580  2445618  2667518  2961915  3330989  3529983  3834415
-  pop_1982 pop_1987 pop_1992 pop_1997 pop_2002 pop_2007
-1 20033753 23254956 26298373 29072015 31287142 33333216
-2  7016384  7874230  8735988  9875024 10866106 12420476
-3  3641603  4243788  4981671  6066080  7026113  8078314
-4   970347  1151184  1342614  1536536  1630347  1639131
-5  6634596  7586551  8878303 10352843 12251209 14326203
-6  4580410  5126023  5809236  6121610  7021078  8390505
-~~~
-{: .output}
+## Learn the structure of your data.
+
+Some base R functions are useful to check out the structure of your data.
+
+- `head()` shows all of the columns (or variables) with just the first six rows,
+  but it is not great for data with a lot of columns. Like this:
 
 ~~~
-srt(data)
+head(data)
 ~~~
 {: .r}
 
+> ## output
+> ~~~
+>   continent      country gdpPercap_1952 gdpPercap_1957 gdpPercap_1962 gdpPercap_1967 gdpPercap_1972
+> 1    Africa      Algeria      2449.0082      3013.9760      2550.8169      3246.9918      4182.6638
+> 2    Africa       Angola      3520.6103      3827.9405      4269.2767      5522.7764      5473.2880
+> 3    Africa        Benin      1062.7522       959.6011       949.4991      1035.8314      1085.7969
+> 4    Africa     Botswana       851.2411       918.2325       983.6540      1214.7093      2263.6111
+> 5    Africa Burkina Faso       543.2552       617.1835       722.5120       794.8266       854.7360
+> 6    Africa      Burundi       339.2965       379.5646       355.2032       412.9775       464.0995
+>   gdpPercap_1977 gdpPercap_1982 gdpPercap_1987 gdpPercap_1992 gdpPercap_1997 gdpPercap_2002 gdpPercap_2007
+> 1      4910.4168      5745.1602      5681.3585      5023.2166      4797.2951      5288.0404      6223.3675
+> 2      3008.6474      2756.9537      2430.2083      2627.8457      2277.1409      2773.2873      4797.2313
+> 3      1029.1613      1277.8976      1225.8560      1191.2077      1232.9753      1372.8779      1441.2849
+> 4      3214.8578      4551.1421      6205.8839      7954.1116      8647.1423     11003.6051     12569.8518
+> 5       743.3870       807.1986       912.0631       931.7528       946.2950      1037.6452      1217.0330
+> 6       556.1033       559.6032       621.8188       631.6999       463.1151       446.4035       430.0707
+>   lifeExp_1952 lifeExp_1957 lifeExp_1962 lifeExp_1967 lifeExp_1972 lifeExp_1977 lifeExp_1982 lifeExp_1987
+> 1       43.077       45.685       48.303       51.407       54.518       58.014       61.368       65.799
+> 2       30.015       31.999       34.000       35.985       37.928       39.483       39.942       39.906
+> 3       38.223       40.358       42.618       44.885       47.014       49.190       50.904       52.337
+> 4       47.622       49.618       51.520       53.298       56.024       59.319       61.484       63.622
+> 5       31.975       34.906       37.814       40.697       43.591       46.137       48.122       49.557
+> 6       39.031       40.533       42.045       43.548       44.057       45.910       47.471       48.211
+>   lifeExp_1992 lifeExp_1997 lifeExp_2002 lifeExp_2007 pop_1952 pop_1957 pop_1962 pop_1967 pop_1972 pop_1977
+> 1       67.744       69.152       70.994       72.301  9279525 10270856 11000948 12760499 14760787 17152804
+> 2       40.647       40.963       41.003       42.731  4232095  4561361  4826015  5247469  5894858  6162675
+> 3       53.919       54.777       54.406       56.728  1738315  1925173  2151895  2427334  2761407  3168267
+> 4       62.745       52.556       46.634       50.728   442308   474639   512764   553541   619351   781472
+> 5       50.260       50.324       50.650       52.295  4469979  4713416  4919632  5127935  5433886  5889574
+> 6       44.736       45.326       47.360       49.580  2445618  2667518  2961915  3330989  3529983  3834415
+>   pop_1982 pop_1987 pop_1992 pop_1997 pop_2002 pop_2007
+> 1 20033753 23254956 26298373 29072015 31287142 33333216
+> 2  7016384  7874230  8735988  9875024 10866106 12420476
+> 3  3641603  4243788  4981671  6066080  7026113  8078314
+> 4   970347  1151184  1342614  1536536  1630347  1639131
+> 5  6634596  7586551  8878303 10352843 12251209 14326203
+> 6  4580410  5126023  5809236  6121610  7021078  8390505
+> ~~~
+> {: .output}
+{: .solution}
+
+- `str()` gives the "shape" of the data by observations and variables and lists
+  all of the variables and their `class()`.
+
 ~~~
-'data.frame':	142 obs. of  38 variables:
- $ continent     : Factor w/ 5 levels "Africa","Americas",..: 1 1 1 1 1 1 1 1 1 1 ...
- $ country       : Factor w/ 142 levels "Afghanistan",..: 3 4 11 14 17 18 20 22 23 27 ...
- $ gdpPercap_1952: num  2449 3521 1063 851 543 ...
- $ gdpPercap_1957: num  3014 3828 960 918 617 ...
- $ gdpPercap_1962: num  2551 4269 949 984 723 ...
- $ gdpPercap_1967: num  3247 5523 1036 1215 795 ...
- $ gdpPercap_1972: num  4183 5473 1086 2264 855 ...
- $ gdpPercap_1977: num  4910 3009 1029 3215 743 ...
- $ gdpPercap_1982: num  5745 2757 1278 4551 807 ...
- $ gdpPercap_1987: num  5681 2430 1226 6206 912 ...
- $ gdpPercap_1992: num  5023 2628 1191 7954 932 ...
- $ gdpPercap_1997: num  4797 2277 1233 8647 946 ...
- $ gdpPercap_2002: num  5288 2773 1373 11004 1038 ...
- $ gdpPercap_2007: num  6223 4797 1441 12570 1217 ...
- $ lifeExp_1952  : num  43.1 30 38.2 47.6 32 ...
- $ lifeExp_1957  : num  45.7 32 40.4 49.6 34.9 ...
- $ lifeExp_1962  : num  48.3 34 42.6 51.5 37.8 ...
- $ lifeExp_1967  : num  51.4 36 44.9 53.3 40.7 ...
- $ lifeExp_1972  : num  54.5 37.9 47 56 43.6 ...
- $ lifeExp_1977  : num  58 39.5 49.2 59.3 46.1 ...
- $ lifeExp_1982  : num  61.4 39.9 50.9 61.5 48.1 ...
- $ lifeExp_1987  : num  65.8 39.9 52.3 63.6 49.6 ...
- $ lifeExp_1992  : num  67.7 40.6 53.9 62.7 50.3 ...
- $ lifeExp_1997  : num  69.2 41 54.8 52.6 50.3 ...
- $ lifeExp_2002  : num  71 41 54.4 46.6 50.6 ...
- $ lifeExp_2007  : num  72.3 42.7 56.7 50.7 52.3 ...
- $ pop_1952      : num  9279525 4232095 1738315 442308 4469979 ...
- $ pop_1957      : num  10270856 4561361 1925173 474639 4713416 ...
- $ pop_1962      : num  11000948 4826015 2151895 512764 4919632 ...
- $ pop_1967      : num  12760499 5247469 2427334 553541 5127935 ...
- $ pop_1972      : num  14760787 5894858 2761407 619351 5433886 ...
- $ pop_1977      : num  17152804 6162675 3168267 781472 5889574 ...
- $ pop_1982      : num  20033753 7016384 3641603 970347 6634596 ...
- $ pop_1987      : num  23254956 7874230 4243788 1151184 7586551 ...
- $ pop_1992      : num  26298373 8735988 4981671 1342614 8878303 ...
- $ pop_1997      : num  29072015 9875024 6066080 1536536 10352843 ...
- $ pop_2002      : int  31287142 10866106 7026113 1630347 12251209 7021078 15929988 4048013 8835739 614382 ...
- $ pop_2007      : int  33333216 12420476 8078314 1639131 14326203 8390505 17696293 4369038 10238807 710960 ...
+str(data)
 ~~~
-{: .output}
+{: .r}
+
+> ## output
+> ~~~
+> 'data.frame':	142 obs. of  38 variables:
+>  $ continent     : Factor w/ 5 levels "Africa","Americas",..: 1 1 1 1 1 1 1 1 1 1 ...
+>  $ country       : Factor w/ 142 levels "Afghanistan",..: 3 4 11 14 17 18 20 22 23 27 ...
+>  $ gdpPercap_1952: num  2449 3521 1063 851 543 ...
+>  $ gdpPercap_1957: num  3014 3828 960 918 617 ...
+>  $ gdpPercap_1962: num  2551 4269 949 984 723 ...
+>  $ gdpPercap_1967: num  3247 5523 1036 1215 795 ...
+>  $ gdpPercap_1972: num  4183 5473 1086 2264 855 ...
+>  $ gdpPercap_1977: num  4910 3009 1029 3215 743 ...
+>  $ gdpPercap_1982: num  5745 2757 1278 4551 807 ...
+>  $ gdpPercap_1987: num  5681 2430 1226 6206 912 ...
+>  $ gdpPercap_1992: num  5023 2628 1191 7954 932 ...
+>  $ gdpPercap_1997: num  4797 2277 1233 8647 946 ...
+>  $ gdpPercap_2002: num  5288 2773 1373 11004 1038 ...
+>  $ gdpPercap_2007: num  6223 4797 1441 12570 1217 ...
+>  $ lifeExp_1952  : num  43.1 30 38.2 47.6 32 ...
+>  $ lifeExp_1957  : num  45.7 32 40.4 49.6 34.9 ...
+>  $ lifeExp_1962  : num  48.3 34 42.6 51.5 37.8 ...
+>  $ lifeExp_1967  : num  51.4 36 44.9 53.3 40.7 ...
+>  $ lifeExp_1972  : num  54.5 37.9 47 56 43.6 ...
+>  $ lifeExp_1977  : num  58 39.5 49.2 59.3 46.1 ...
+>  $ lifeExp_1982  : num  61.4 39.9 50.9 61.5 48.1 ...
+>  $ lifeExp_1987  : num  65.8 39.9 52.3 63.6 49.6 ...
+>  $ lifeExp_1992  : num  67.7 40.6 53.9 62.7 50.3 ...
+>  $ lifeExp_1997  : num  69.2 41 54.8 52.6 50.3 ...
+>  $ lifeExp_2002  : num  71 41 54.4 46.6 50.6 ...
+>  $ lifeExp_2007  : num  72.3 42.7 56.7 50.7 52.3 ...
+>  $ pop_1952      : num  9279525 4232095 1738315 442308 4469979 ...
+>  $ pop_1957      : num  10270856 4561361 1925173 474639 4713416 ...
+>  $ pop_1962      : num  11000948 4826015 2151895 512764 4919632 ...
+>  $ pop_1967      : num  12760499 5247469 2427334 553541 5127935 ...
+>  $ pop_1972      : num  14760787 5894858 2761407 619351 5433886 ...
+>  $ pop_1977      : num  17152804 6162675 3168267 781472 5889574 ...
+>  $ pop_1982      : num  20033753 7016384 3641603 970347 6634596 ...
+>  $ pop_1987      : num  23254956 7874230 4243788 1151184 7586551 ...
+>  $ pop_1992      : num  26298373 8735988 4981671 1342614 8878303 ...
+>  $ pop_1997      : num  29072015 9875024 6066080 1536536 10352843 ...
+>  $ pop_2002      : int  31287142 10866106 7026113 1630347 12251209 7021078 15929988 4048013 8835739 614382 ...
+>  $ pop_2007      : int  33333216 12420476 8078314 1639131 14326203 8390505 17696293 4369038 10238807 710960 ...
+> ~~~
+> {: .output}
+{: .solution}
+
+- `names()` generates a vector of all the variables in the data.
 
 ~~~
 names(data)
 ~~~
 {: .r}
 
-~~~
- [1] "continent"      "country"        "gdpPercap_1952" "gdpPercap_1957" "gdpPercap_1962" "gdpPercap_1967"
- [7] "gdpPercap_1972" "gdpPercap_1977" "gdpPercap_1982" "gdpPercap_1987" "gdpPercap_1992" "gdpPercap_1997"
-[13] "gdpPercap_2002" "gdpPercap_2007" "lifeExp_1952"   "lifeExp_1957"   "lifeExp_1962"   "lifeExp_1967"  
-[19] "lifeExp_1972"   "lifeExp_1977"   "lifeExp_1982"   "lifeExp_1987"   "lifeExp_1992"   "lifeExp_1997"  
-[25] "lifeExp_2002"   "lifeExp_2007"   "pop_1952"       "pop_1957"       "pop_1962"       "pop_1967"      
-[31] "pop_1972"       "pop_1977"       "pop_1982"       "pop_1987"       "pop_1992"       "pop_1997"      
-[37] "pop_2002"       "pop_2007" 
-~~~
-{: .output}
+> ## output
+> ~~~
+>  [1] "continent"      "country"        "gdpPercap_1952" "gdpPercap_1957" "gdpPercap_1962" "gdpPercap_1967"
+>  [7] "gdpPercap_1972" "gdpPercap_1977" "gdpPercap_1982" "gdpPercap_1987" "gdpPercap_1992" "gdpPercap_1997"
+> [13] "gdpPercap_2002" "gdpPercap_2007" "lifeExp_1952"   "lifeExp_1957"   "lifeExp_1962"   "lifeExp_1967"  
+> [19] "lifeExp_1972"   "lifeExp_1977"   "lifeExp_1982"   "lifeExp_1987"   "lifeExp_1992"   "lifeExp_1997"  
+> [25] "lifeExp_2002"   "lifeExp_2007"   "pop_1952"       "pop_1957"       "pop_1962"       "pop_1967"      
+> [31] "pop_1972"       "pop_1977"       "pop_1982"       "pop_1987"       "pop_1992"       "pop_1997"      
+> [37] "pop_2002"       "pop_2007" 
+> ~~~
+> {: .output}
+{: .solution}
 
 > ## Data Structures Challenge
 >
@@ -149,3 +178,5 @@ names(data)
 > > B.  38
 > {: .solution}
 {: .challenge}
+
+[data]: {{ site.baseurl }}/data/gapminder_all.csv
\ No newline at end of file

From d33e9bca57af2b5fcd954b6cc8c4ee1c82bcc989 Mon Sep 17 00:00:00 2001
From: brymz <z.t.brym@gmail.com>
Date: Thu, 8 Dec 2016 14:48:45 -0600
Subject: [PATCH 4/5] 03-data-structures.md: draft episode content

---
 _episodes/03-data-structures.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/_episodes/03-data-structures.md b/_episodes/03-data-structures.md
index cb484cd..5c43690 100644
--- a/_episodes/03-data-structures.md
+++ b/_episodes/03-data-structures.md
@@ -23,6 +23,23 @@ keypoints:
   categories."
 ---
 
+## Data structure is the shape and content of your data.
+
+The shape of your data ought to be rectangular and can be described by the number of rows and columns.
+
+Column in your data can also be called 'variables' or 'fields'. However, the general use of the term 'variable' to describe a column can also be used more specifically to describe numerically continuous data. As a complement to this specific use of 'variables', 'categories' describe discrete or categorical data that can be organized in groups.
+
+'Values' make up the rows of the data and are more specifically associated with
+cells in a data table. 'Values' can represent numerical or categorical data.
+Numerical values should be recognized as 'absolute' or 'relative'. 'Absolute'
+values receive context by their units, while 'relative' values are standardized
+in some fashion (i.e., proportion, per unit) and most often used for comparison
+among categories.
+
+Rows of data may represent observational or experimental 'replicates'. Most
+generally, 'replicates' are values with similar variables and categories that
+are evaluated in data analysis.
+
 > ## Data Organization Challenge
 >
 > What years are represented in the Gapminder data?
@@ -42,3 +59,13 @@ keypoints:
 > > A.  `country`, because it is a 'categorical' variable
 > {: .solution}
 {: .challenge}
+
+## Tidy data follows a set of rules.
+
+These rules keep data well organized and ready for analysis.
+
+1. Order doesn’t matter
+2. No duplicate rows
+3. Every cell contains one value
+4. One column per type of information
+5. No redundant information

From ad601da1c46ce3f87e566a3088adf187d812df36 Mon Sep 17 00:00:00 2001
From: brymz <z.t.brym@gmail.com>
Date: Thu, 8 Dec 2016 16:02:57 -0600
Subject: [PATCH 5/5] 04-dplyr.md: draft episode content

---
 _episodes/04-dplyr.md | 62 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 62 insertions(+)

diff --git a/_episodes/04-dplyr.md b/_episodes/04-dplyr.md
index ae2f3b7..cce650c 100644
--- a/_episodes/04-dplyr.md
+++ b/_episodes/04-dplyr.md
@@ -12,11 +12,71 @@ keypoints:
 - "Use `dplyr` to manipulate, summarize, and analyze your data."
 ---
 
+## A *realy* quick refresher on `dplyr`
+
+`dplyr` is a package that needs to be installed and initiated in your R console.
+You should have already installed it using `install.packages("dplyr")`.
+
 ~~~
 library(dplyr)
 ~~~
 {: .r}
 
+For this lesson we will primarily use the basic set of `dplyr` functions
+(`select()`, `filter()`, `group_by()`, and `summarize()`) to manipulate the data
+to the form required for our analysis and visualization. A more in-depth
+introduction can be found [here][dplyr-vingette].
+
+Each of the `dplyr` functions takes a `data.frame` (or `tibble`) as its first
+argument.
+  
+`select()` takes a list of column names and returns a `tibble` with those columns.
+
+~~~
+select(data, country, pop_1952)
+~~~
+{: .r}
+
+`filter()` takes a conditional statement and returns a `tibble` that is a subset of the `data`.
+
+~~~
+filter(data, pop_1952 < 500000)
+~~~
+{: .r}
+
+Often, the `tibble` is a result of another `dplyr` function. So,
+functions can be nested or piped (%>%) to link the data manipulation steps.
+
+~~~
+select(filter(data, pop_1952 < 500000), country, pop_1952)
+~~~
+{: .r}
+
+~~~
+data %>%
+  filter(pop_1952 < 500000) %>%
+  select(country, pop_1952)
+~~~
+{: .r}
+
+`summarize()` uses standard math functions (e.g., `min()`, `max()`, `sum()`,
+`mean()`) to generate new data values.
+
+~~~
+summarize(data, count = n(), min_pop1952 = min(pop_1952))
+~~~
+{: .r}  
+
+`group_by()` takes a list of categorical columns and is used to initiate grouping the `summarize()` calculations.
+
+~~~
+data %>%
+  group_by(continent) %>%
+  summarize(sum_pop1952 = sum(pop_1952),
+            avg_pop1952 = mean(pop_1952))
+~~~
+{: .r}
+
 > ## dplyr Challenge
 >
 > How many countries are in Africa?
@@ -68,3 +128,5 @@ library(dplyr)
 > > {: .r}
 > {: .solution}
 {: .challenge}
+
+[dplyr-vingette]: https://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html
\ No newline at end of file