-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01-intro-to-r.Rmd
198 lines (128 loc) · 14.2 KB
/
01-intro-to-r.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
# Introduction to programming in R {#Chapter1}
<img src="images/roh.png" alt="alcohol molecule">
<p style="font-family: times, serif; font-size:.9em; font-style:italic">
Title image. Read about it [there]{#title}.
</p>
<br>
Welcome to programming in R! This module will serve as a tutorial to help you get acquainted with the R programming environment, and will get you started with some basic tools and information that will help you along your way.
We will use the [Rstudio IDE](https://rstudio.com/) to work with [R](https://www.r-project.org/) in this class. It is important to note here that R is the program doing all of the thinking when we write and run code, and RStudio is a software tool that makes it a little easier to work with R - so we're going to need them both (plus a few other tools we'll check out along the way).
## What is R?
Go Google it. This is The Worst Stats Text eveR.
Okay, okay. Briefly, R is a statistical programming language. That language is made up of functions and various objects (R is functional and object-oriented). Objects are things that we do stuff to, or that we create by doing stuff. Functions are the things that do stuff to objects or create objects by doing things. A lot of functions and objects are included in the `base` software distribution. Other collections of functions and objects are available through "packages". You could think of these packages like web-extensions, add-ins for Microsoft programs, or mods for Minecraft. These packages may be written in R or built on a variety of other programming languages you may have heard of like C, C++, java, Python, etc. You can see a YouTube demo of installing packages in RStudio <a href="https://www.youtube.com/watch?v=u1r5XTqrCTQ">here</a>. We will talk more about this later.
Because R is open-source anybody can write packages (even me). Therefore, there are lots of packages out there and many of them have functions that do the same thing but have slightly different names or behaviors. This framework, and an avid user-community has propelled the capabilities of R and RStudio during recent years, and now R can do everything from basic arithmetic to spatial time-series analysis to <a href="https://www.r-bloggers.com/make-your-amazon-purchases-with-r/">searching Amazon</a>. This means that learning R is also a lot like learning the English language because there are about 10 ways to do everything and many of those are based on other programming languages.
## Why should I use R?
For now: because this whole class revolves around your using R. If you don't, you'll fail or look silly at a job interview. I started using R because I needed it to finish my master's thesis. I'd like to think some people start using R "just because" they want to, but usually those people just say they want to start it.
Students are in a unique position to be able to do the things they want to do because they have to do them (somebody write that down). Most of us should probably make it more of a priority.
On that note, hopefully the "why" becomes obvious to you during our time together even if you don't want to be a data scientist or a modeler. If you only ever use R to do t-tests or make descriptive plots it is worth learning. The ability to re-use the same code for a later analysis alone can save you hours. You never lose what you write (and back up!). So, the more and the longer you write R code, the more time you will have to do other things in life that you care more about (as if). If R *is* what you'll love, then hopefully we can help you enjoy that more, too. It's the software that everyone is using because of these things and more, and the development community has continued to grow during the past two decades. That means help is everywhere. Go Google it.
## Where do I start?
If you haven't downloaded and installed the most recent versions of R and RStudio, you should probably go do that now. We'll wait...
Once you have installed both of these, find and open RStudio on your computer so you can work along with the examples below.
It may be helpful to watch a couple of YouTube videos before going much further, especially if you are stuck already (no shame). There are tons of them out there, including some that walk you through how to install and open R and RStudio. They range from just a couple of minutes to a couple of hours. Here's one example provided by the [How To R Channel](https://www.youtube.com/watch?v=lVKMsaWju8w).
Depending on how long that took, you may or may not be enthused by the following:
> the learning curve for R is steep...like a cliff, not a hill.
But, once you get the hang of it you can learn a lot really quickly. Cheat sheets like [these](https://www.rstudio.com/resources/cheatsheets/) reference cards can help you along the way by serving as miniature reference manuals in the meantime. There are also *tons* of e-books and websites out there like the one you are reading now. And, there is a huge, active user-community just a Google away. Just searching "how to ___ in R" will return multiple results for most questions, with everything from open-source text books like this to R project websites (e.g. [RStan](https://mc-stan.org/users/interfaces/rstan)) or programming forums like [StackOverflow](https://stackoverflow.com/questions/tagged/r). You can find links to a few [Additional Resources](https://danstich.github.io/stich/classes/BIOL217/resources.html) on the course website, but part of learning R is learning how to Google about R.
## Programming conventions
### Style and organization {-#style}
Learning to write code will be easier if you bite the bullet early on and adopt some kind of organization that allows you to interact with it (read, write, run, stare aimlessly, debug) more efficiently.
There are a lot of different ways to write computer code. All of them are intended to increase efficiency and readability. Some rules are more hard-coded and program-specific than others. For example, students in this class will notice that none of my code goes beyond a certain vertical line in the editor. That is to make it so that people don't have to scroll over to the right of the editor to see what I have written when I email them code. When I share code with students I tend to justify everything *really* far to the left because everyone works on tiny laptops with multiple windows open and none of them maximized [shudders].
I suppose there is no "right" way to edit your code, but it will make your life easier if you find a style you like and stick to those conventions. If you are the kind of person who needs order
in your life, you can check out the `tidyverse` [style guide](https://style.tidyverse.org/documentation.html) for tips. You can check code style with the [`lintr`](https://github.com/jimhester/lintr) package or interactively re-style your code with the [`styler`](https://styler.r-lib.org/) package if you"re thinking that may be a lot of work to remember on the front-end.
Regardless of how you end up styling your code, here are a few helpful hints that ought to help you get comfortable with your keyboard. I guess these are probably generally applicable to programming and not specific to R.
### Some handy coding tips {-#tips}
**Get to know your keyboard and your speed keys for code execution and completion.** Use the mouse to navigate the GUI, not to write code. Here is a fairly comprehensive list of speed-key combinations for all of the major operating systems from the [Rstudio website](https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts). You don't need to know them all, but it can save you a **ton** of time.
<br>
**File management is wicked important.** This is probably one of the primary struggles folks have with starting to learn R and other languages. At the same time, it is a big part of the secret sauce behind good programming. **For this class, I will assume that you are working out of a single working directory (call it something like "quant_bio" or "biol217". That means I will assume your scripts (`.R` files) for each chapter are in the same folder on your computer as your the folder that contains your data.**
An example of your class folder might look like this:
<img src="images/folders.png" alt="">
**Save early and often**
In general, RStudio is really good about keeping track of things for you, and it is more and more foolproof these days. However, there are still times when it will crash and there is nothing you can do to get your work back unless it has been saved to a file. So, whenever you write code, write it in a source file that is saved in a place you know you can find it. It is the first thing I do when I start a script, and the last thing I do before I run any code.
Please go check out the supplemental materials on the course website or check out the YouTube video linked above for more help getting started in R if you have no idea what I am talking about at this point.
<br>
**Commenting code is helpful**
And I will require that you do it, at least to start. Comments are a way for you to explain what your code does and why. This is useful for sharing code or just figuring out what you did six months ago. It could also be that critical piece of clarity that makes me say "Oh, I see what you did there, +1" on your homework.
```{r}
# This is a comment.
# We know because it is preceded
# by a hashtag, or "octothorpe".
# R ignores comments so you have
# a way to write down what you have
# done or what you are doing.
```
<br>
**Section breaks help organization**
I like to use the built-in heading style. It works really well for code-folding in R and when I"ve written a script that is several hundred lines long, sometimes all I want to see is an outline. Go ahead and type the code below into a source file (File > New File > Rscript or `Ctrl+Shift+N`) and save it (File > Save As or `Ctrl+S`). Press the little upside-down triangle to the left of the line to see what it does.
```{r}
# Follow a comment with four dashes or hashes
# to insert a section heading
# Section heading ----
# Also a section heading ####
```
This is really handy for organizing sections in your homework or for breaking code up into smaller sections when you get started. You'll later learn that when you have to do this a lot, there are usually ways you can reduce your code or split it up more efficiently into other files.
### Stricter R programming rules {-#rules}
For the next section, open RStudio if it is not already and type the code into a new source file (`Ctrl+Shift+N`).
<br>
**All code is in R is case sensitive.**
Run the following lines (with the Run button or `Ctrl+Enter`). If you highlight all of them, they will all be run in sequence from top to bottom. Or, you can manually run each line. Running each line can be helpful for learning how to debug code early on.
```{r}
# Same letter, different case
a <- 1
A <- 2
a == A
```
<br>
So, what just happened? A few things going on here.
1. We've defined a couple of objects for the first time. If we translate the first line of code, we are saying, "Hey R, assign the value of `1` to an object named `a` for me."
2. Note that the two objects are not the same, and R knows this.
3. The `==` that we typed is a logical test that checks to see if the two objects are identical. If they were, then it would have returned a `TRUE` instead of `FALSE`. This **operator** is very useful, and is more or less ubiquitous in object-oriented languages. We will use it extensively for data queries and conditional indexing (ooooh, I know!).
<br>
**R will overwrite objects sequentially, so don't name two things the same, unless you don't need the first.**
```{r, eval = FALSE, echo = TRUE}
a <- 1
a <- 2
a # a takes on the second value here
print(a) # This is another way to look at the value of an object
show(a) # And, here is one more
```
<br>
**Names should be short and meaningful.** `a` is a terrible name, even for a temporary object in most cases.
```{r, eval=FALSE}
myFirstObject <- 1
```
Cheesy, but better...
<br>
**Punctuation and special symbols are important** And, they are annoying to type in names. Avoid them in object names except for underscores "`_`" where you can. I try to stick with lowercase for everything I do except built-in data and data from external files because it is a pain to change everything.
```{r, eval = FALSE}
myobject <- 1 # Illegible
my.Object <- 1 # Annoying to type
myObject <- 1 # Better, but still annoying
my_object <- 1 # Same: maybe find a less annoying name?
```
Importantly, R doesn't really care and would treat all of these as unique, but equivalent objects in all regards. It's worth noting that most R style recommendations are moving toward the last example above.
<br>
**Some symbol combinations are not allowed in object names** But, these are usually bad names or temporary objects that create junk in your workspace anyway.
```{r, eval=FALSE}
# In Rstudio there are nifty
# little markers to show this
# is broken
# 1a <- 1
# This one works (try it by typing
# "a1" in the console after you run
# the code below)
a1 <- 1
a2 <- a1 + 1
a3 <- a2 + 1
```
We'll see later that sequential operations that require creation of redundant objects (that require memory) are usually better replace by over-writing objects in place or using functions like the pipe `%>%` from the `magrittr` package that help us keep a "tidy" workspace.
<br>
**Some things can be expressed in multiple ways.** Both `T` and `TRUE` can be used to indicate a logical that evaluates as being TRUE. But `t` is used to transpose data.
```{r, eval=FALSE}
T == TRUE
```
<br>
**Some names are "reserved", "built-in", or pre-defined.** Did you notice that R already knew what `T` and `TRUE` were? We will talk more about this later in the course if we need to.
Other examples include functions like `in, if, else, for, function()` and a mess of others have special uses.
<br>
**Some *symbols* are also reserved for special use as "operators", like:**
`+, -, *, % %, &, /, <, (, {, [, "", '', ...`, and a bunch of others. We will use basically all of these in just the first couple of chapters.
## Next steps {#next1}
These are just some basic guidelines that should help you get started working with R and RStudio. In [Chapter 2](#data_struc), we will begin working with objects, talk about how R sees those objects, and then look at things we can do to those objects using functions.