-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathRworkshopIV.Rmd
370 lines (270 loc) · 22.3 KB
/
RworkshopIV.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
---
title: "Hello, R!"
author: "Yue Hu's R Workshop Series III"
output:
ioslides_presentation:
self_contained: yes
incremental: yes
logo: image/logo.gif
slidy_presentation: null
transition: faster
widescreen: yes
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE)
```
## Tabling
There are over twenty packages for [table presentation](http://conjugateprior.org/2013/03/r-to-latex-packages-coverage/) in R. My favoriate three are `stargazer`, `xtable`, and `texreg`.
(Sorry, but all of them are for **Latex** output)
* `stargazer`: good for summary table and regular regression results
* `texreg`: when some results can't be presented by `stargazer`, try `texreg` (e.g., MLM results.)
* `xtable`: the most extensively compatible package, but need more settings to get a pretty output, most of which `stargazer` and `texreg` can automatically do for you.
## An example {.smaller .columns-2}
```{r message = F}
lm_ols <- lm(mpg ~ cyl + hp + wt, data = mtcars)
stargazer::stargazer(lm_ols, type = "text", align = T)
```
* For Word users, click [here](http://www.r-statistics.com/2010/05/exporting-r-output-to-ms-word-with-r2wd-an-example-session/).
## Print out directly in the website or the manuscript{.smaler}
```{r results='asis'}
stargazer::stargazer(lm_ols, type = "html", align = T)
```
# But...why tabulating the results if you can plot it?
## How do R plots look like
<div class="centered">
<img src="http://mkweb.bcgsc.ca/embo/img/hiveplot-02.png" height="450"/>
</div>
----
<div class="center">
<img src="http://spatial.ly/wp-content/uploads/2012/02/bike_ggplot-1024x676.png" height="600"/>
</div>
----
<div class="center">
<img src="http://i.imgur.com/ELEA9FP.gif" height="550"/>
</div>
## Too "fancy" for your research? Then...
* <div class="centered">
<img src="http://fsolt.org/blog/dotwhisker1.jpg" height="530"/>
</div>
----
<div class="centered">
<img src="" height="550" width = "500"/>
</div>
----
<div class="centered">
<img src="http://fsolt.org/blog/interplot1.png" height="450"/>
</div>
## Let's Start!
* Basic plots: `plot()`.
* Lattice plots: e.g., `ggplot()`.
* Interactive plots: `shiny()`. (save for later)
+ <div class="centered">
<img src="http://i.stack.imgur.com/qZObK.png" height="300"/>
</div>
## Basic plot
Pro:
* Embedded in R
* Good tool for <span style="color:purple">data exploration</span>.
* <span style="color:purple">Spatial</span> analysis and <span style="color:purple">3-D</span> plots.
Con:
* Not very pretty
* Not very flexible
## An example: create a histogram
```{r fig.align="center"}
hist(mtcars$mpg)
```
## Saving the plot{.build}
* Compatible format:`.jpg`, `.png`, `.wmf`, `.pdf`, `.bmp`, and `postscript`.
* Process:
1. call the graphic device
2. plot
3. close the device
```{r eval = F}
jpeg("histgraph.jpg")
hist
dev.off()
```
<span style="color:green">Tip</span>
<div class="notes">
Sometimes, RStudio may distort the graphic output. In this situation, try to <span style="color:purple">zoom</span> or use `windows()` function.
</div>
----
The device list:
| Function | Output to |
|----------------------------- |------------------ |
| pdf("mygraph.pdf") | pdf file |
| win.metafile("mygraph.wmf") | windows metafile |
| png("mygraph.png") | png file |
| jpeg("mygraph.jpg") | jpeg file |
| bmp("mygraph.bmp") | bmp file |
| postscript("mygraph.ps") | postscript file |
## `ggplot`: the most popular graphic engine in R {.build}
+ Built by Hadley Wickham based on Leland Wilkinson's *Grammar of Graphics*.
+ It breaks the plot into components as <span style="color:purple">scales</span> and <span style="color:purple">layers</span>---increase the flexibility.
+ To use `ggplot`, one needs to install the package `ggplot2` first.
```{r message=FALSE}
library(ggplot2)
```
## Histogram in `ggplot`
```{r fig.align="center", fig.height=2.7}
ggplot(mtcars, aes(x=mpg)) +
geom_histogram(aes(y=..density..), binwidth=2, colour="black")
```
## Decoration
```{r fig.align="center", fig.height=2.7}
ggplot(mtcars, aes(x=mpg)) +
geom_histogram(aes(y=..density..), binwidth=2, colour="black", fill="purple") +
geom_density(alpha=.2, fill="blue") + # Overlay with transparent density plot
theme_bw() + ggtitle("histogram with a Normal Curve") +
xlab("Miles Per Gallon") + ylab("Density")
```
## Break in Parts:{.smaller}
```{r eval=FALSE}
ggplot(data = mtcars, aes(x=mpg)) +
geom_histogram(aes(y=..density..), binwidth=2, colour="black", fill="purple") +
geom_density(alpha=.2, fill="blue") + # Overlay with transparent density plot
theme_bw() + ggtitle("histogram with a Normal Curve") +
xlab("Miles Per Gallon") + ylab("Density")
```
* `data`: The data that you want to visualise
* `aes`: Aesthetic mappings
describing how variables in the data are mapped to aesthetic attributes
+ horizontal position (`x`)
+ vertical position (`y`)
+ colour
+ size
* `geoms`: Geometric objects that represent what you actually see on
the plot
+ points
+ lines
+ polygons
+ bars
----
* `theme`, `ggtitle`, `xlab`, `ylab`: decorations.
* Other parts you may see in some developed template
+ `stats`: Statistics transformations
+ `scales`: relate the data to the aesthetic
+ `coord`: a coordinate system that describes how data coordinates are
mapped to the plane of the graphic.
+ `facet`: a faceting specification describes how to break up the data into sets.
## An advanced version:
```{r fig.height=3}
library(dplyr)
df_desc <- select(mtcars, am, carb, cyl, gear,vs) %>% # select the variables
tidyr::gather(var, value) # reshape the wide data to long data
ggplot(data = df_desc, aes(x = as.factor(value))) + geom_bar() +
facet_wrap(~ var, scales = "free", ncol = 5) + xlab("")
```
## Save `ggplot`
* `ggsave(<plot project>, "<name + type>")`:
+ When the `<plot project>` is omitted, R will save the last presented plot.
+ There are additional arguments which users can use to adjust the size, path, scale, etc.
## Plotting with packages: `dotwhisker`{.smaller}
Plot the comparable coefficients or other estimates (margins, predicted probabilities, etc.).
```{r message=FALSE}
library(dotwhisker)
library(broom)
m1 <- lm(mpg ~ wt + cyl + disp + gear, data = mtcars)
```
----
```{r}
summary(m1)
```
----
```{r}
dwplot(m1)
```
----
```{r message=F, fig.align="center", fig.height=4}
m2 <- update(m1, . ~ . + hp) # add another predictor
m3 <- update(m2, . ~ . + am) # and another
dwplot(list(m1, m2, m3))
```
----
```{r eval = F}
dwplot(list(m1, m2, m3)) +
relabel_y_axis(c("Weight", "Cylinders", "Displacement",
"Gears", "Horsepower", "Manual")) +
theme_bw() + xlab("Coefficient Estimate") + ylab("") +
geom_vline(xintercept = 0, colour = "grey60", linetype = 2) +
ggtitle("Predicting Gas Mileage") +
theme(plot.title = element_text(face="bold"),
legend.justification=c(0, 0), legend.position=c(0, 0),
legend.background = element_rect(colour="grey80"),
legend.title = element_blank())
```
----
```{r echo = F}
dwplot(list(m1, m2, m3)) +
relabel_y_axis(c("Weight", "Cylinders", "Displacement",
"Gears", "Horsepower", "Manual")) +
theme_bw() + xlab("Coefficient Estimate") + ylab("") +
geom_vline(xintercept = 0, colour = "grey60", linetype = 2) +
ggtitle("Predicting Gas Mileage") +
theme(plot.title = element_text(face="bold"),
legend.justification=c(0, 0), legend.position=c(0, 0),
legend.background = element_rect(colour="grey80"),
legend.title = element_blank())
```
## Plotting with packages: `interplot`{.smaller}
```{r message=FALSE}
library(interplot)
lm_in <- lm(mpg ~ cyl + hp * wt, data = mtcars)
```
----
```{r}
summary(lm_in)
```
----
```{r fig.align="center"}
interplot(m = lm_in, var1 = "hp", var2 = "wt", hist = TRUE) +
xlab("Automobile Weight (thousands lbs)") +
ylab("Estimated Coefficient for \nGross horsepower")
```
## Wrap Up
* R has a bunch of packages for creating publishing-like tables, e.g., `stargazer`, `xtable`, and `texreg`
* There are three ways to visualize statistics in R: basic, lattice (`ggplot`), and interactive.
+ basic: e.g., `hist(<vector>)`
+ `ggplot`: /n e.g., `ggplot(<data>, aes(x=<vector>)) + geom_histogram()`.
* Two special types of plot:
+ Estimate plot with [`dotwhisker`](https://cran.r-project.org/web/packages/interplot/vignettes/interplot-vignette.html).
+ Interaction plot with [`interplot`](https://cran.r-project.org/web/packages/dotwhisker/vignettes/dwplot-vignette.html).
## Almost the end: one topic left
<div class="centered">
[![present](http://conservatives4palin.com/wp-content/uploads/2013/06/snob.gif)]
</div>
# Version Control
## Just a brief introduction{.columns-2 .build}
<div class = "center">
<img src= "http://www.foldertrack.com/images/Personal_Version_Mess.png" width = "400" height = "400" />
</div>
* Tried to recall the deleted codes?
* Tried to figure out what changes?
* Saved a lot of replication files?
* Version control can help you.
----
<div class = "center">
<img src="http://cdn.arstechnica.net//wp-content/uploads/2012/05/uncommitted-changes-1.png" />
</div>
## Using Git with RStudio
* RStudio has associate with the Git and SVN very well.
* Process to use git:
+ Get a user account in https://github.com.
+ Connect your account with RStudio following [this instruction](http://www.molecularecologist.com/2013/11/using-github-with-r-and-rstudio/).
+ Create a version-control project in RStudio
+ <img src="http://i0.wp.com/geraldbelton.com/wp-content/uploads/2017/01/new-project.jpg" height = "200" />
+ Commit, Pull and Push
## External Sources
* Q&A Blogs:
+ http://stackoverflow.com/questions/tagged/r
+ https://stat.ethz.ch/mailman/listinfo/r-help
* Blog for new stuffs: http://www.r-bloggers.com/
* Graph Blogs:
+ http://www.cookbook-r.com/Graphs/
+ http://shiny.stat.ubc.ca/r-graph-catalog/
* Workshops: http://ppc.uiowa.edu/node/3608
* Consulting service: http://ppc.uiowa.edu/isrc/methods-consulting
----
<div class = "center">
[![end](http://rescuethepresent.net/tomandjerry/files/2016/05/16-thanks.gif)]
</div>