-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathIntro.Rmd
1407 lines (1006 loc) · 48.4 KB
/
Intro.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "Intro to R"
output:
learnr::tutorial:
css: "css/style.css"
progressive: true
allow_skip: true
runtime: shiny_prerendered
---
```{r setup, include=FALSE}
# Author: Russell McCreath
# Original Date: Aug 2020
# Version of R: 3.6.1
library(learnr)
library(gradethis)
library(stringr)
library(readr)
library(haven)
library(dplyr)
library(purrr)
library(ggplot2)
knitr::opts_chunk$set(echo = FALSE)
tutorial_options(
exercise.checker = gradethis::grade_learnr
)
borders_data <- readRDS("www/data/borders.rds")
borders_age_data <- read_csv("www/data/BORDERS (inc Age).csv")
baby5 <- read_csv("www/data/Baby5.csv")
baby6 <- read_csv("www/data/Baby6.csv")
```
```{r phs-logo, echo=FALSE, fig.align='right', out.width="40%"}
knitr::include_graphics("images/phs-logo.png")
```
## Introduction
Welcome to an Introduction to R. This course is designed as a self-led introduction to R for anyone in Public Health Scotland. Throughout this course there will be quizzes to test your knowledge and opportunities to modify and write R code. Below is an overview of the learning pathway.
```{r intro-pathway, echo=FALSE, fig.align='center', out.width="100%"}
knitr::include_graphics("images/r-intro-pathway-2.png")
```
<div class="info_box">
<h4>Course Info</h4>
<ul>
<li>This course is built to flow through sections and build on previous knowledge. If you're comfortable with a particular section, you can skip it.</li>
<li>Most sections have multiple parts to them. Navigate the course by using the buttons at the bottom of the screen to Continue or go to the Next Topic.</li>
<li>The course will also show progress through sections, a green tick will appear on sections you've completed, and it will remember your place if you decide to close your browser and come back later.</li>
</ul>
</div>
</br>
### What is R?
* A programming language widely used for data analysis, statistics, and graphics
* Open source and available on all major operating systems
* Has the functionality to go from raw data to interactive reports, web apps, and more
* A part of the PHS analytical strategy
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_welcome_twitter.png")
```
</div>
</br>
Since we're getting started, here's a quiz to get familiar with the layout:
```{r intro-quiz}
quiz(
question("Which of the following can R help to produce?",
answer("Tidy Data", correct = TRUE),
answer("Dashboards", correct = TRUE),
answer("Static/Interactive Reports", correct = TRUE),
answer("Web Apps", correct = TRUE),
answer("Magic", message = "It may seem like R is magic but unfortunately not"),
answer("Databases", correct = TRUE),
answer("Presentations", correct = TRUE),
incorrect = "Not quite, have another go!",
allow_retry = TRUE,
random_answer_order = TRUE
)
)
```
## Foundations
We're going to start with a high-level overview of programming concepts which will help lay the foundations for building your R skills. We'll then build on these concepts with the specific syntax in R, hopefully having some fun along the way. This graphic shows the structure of the concepts and how they come together to form a program:
<div class="supporting-image-left">
```{r foundations-buildingblocks, echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/r-foundations-buildingblocks.png")
```
</div>
1. **Basic data types** - representing fundamental data, like numbers and text.
2. **Complex data types** - taking basic data types and forming more complex, composite data types, e.g. tables.
3. **Variables** - named storage to track "objects" across a program.
4. **Statements** - a complete line of code, made of expressions and operators.
5. **Control Flow** - branching (if statements) and iteration (loops).
6. **Functions** - reusable chunks of code that can take inputs and give outputs.
</br>
### R Foundations
Base R is the fundamental language and what we'll explore in this section.
#### Anatomy of a Program
Below is an example of R code, it includes various types of R syntax:
* `# Hello World example` - **comments** are just for us humans and ignored by R. They help to aid understanding of code, think more "why" than "what". Comments are created using the `#` symbol at the start of a line.
* `hello_world` - **variables** are names (containers) we give to 'objects'.
* `<-` - **assignment operator** (shortcut: `alt` + `-`) is how we give variables their content.
* `"Hello World"` - **character** / string (`'` or `"`) is one of the *basic data types* in R, we go over the others next.
* `print()` - **functions** allow us to get R to do something by passing in arguments (the stuff inside the brackets).
In most cases, if you're not assigning something the result will be printed to the console. This is the same as using the `print()` function.
Have a look and click 'Run Code' below to see the output. Then, change the code to get R to print "Hello \<your name\>".
```{r foundations-input, exercise=TRUE}
# Hello World example
hello_world <- "Hello World"
print(hello_world)
```
```{r foundations-input-check}
grade_result(
pass_if(~ startsWith(.result, "Hello") & .result != "Hello World"),
fail_if(~ identical(as.character(.result), "Hello World"), "Try changing the output from Hello World."),
fail_if(~ TRUE, "Have you entered a string to say Hello?")
)
```
</br>
#### Style Guide
Maintaining a style guide ensures that code can be more easily shared and understood. The [PHS R Style Guide](https://github.com/Public-Health-Scotland/R-Resources/blob/master/PHS%20R%20style%20guide.md) is the style guidance for everyone using R within PHS. It is designed to allow flexibility for working across different projects but is detailed enough to provide benefit.
A couple of high-level important points to take forward:
* **Naming** - variables and filenames should have meaningful names in *`snake_case`* format, preferring all lower case.
<div>
```{r echo=FALSE, fig.align='center', out.width="100%"}
knitr::include_graphics("images/illustrations/r_coding_cases.png")
```
</div>
* **Structure**:
* Spaces after commas (just like in English prose).
* No spaces before or after parenthesis.
* Comments are used to explain code and create sections/structure.
* Prefer `"` over `'` for character strings.
### Basic Data Types
R has some basic/primitive data types:
* **Character**, also called strings, are written in single `'` or double `"` quotes around text, numbers, or symbols. It may be necessary to store a "number" as a character object, e.g. CHI numbers to preserve formatting. *Example:* `"Hello World!"`
* **Numerical** type holds the whole set of real numbers. *Example:* `321` or `123.5`
* **Logical** (Boolean) can be `TRUE`, `FALSE`, `T`, or `F`, other variations (e.g. lowercase) will result in an error. *Example:* `TRUE`
* **Complex** stores complex number objects, e.g. imaginary numbers. *Example:* `2i`
Have a look and click 'Run Code' below to see the output. `typeof()` is a function that returns the argument's basic data type, `is.<data_type>()` returns a Boolean `TRUE` or `FALSE` depending on whether the argument provided is the data type in the name of the function.
```{r foundations-basic-types, exercise=TRUE}
typeof("Hello World")
is.numeric(123.5)
print(typeof(2 + 2i))
```
</br>
#### Type Conversion
It's sometimes necessary to convert from one basic data type to another. For example, we may receive a data-set with Boolean values stored as `1` and `0` or lowercase `true` and `false`. The function `as.<data_type>()` performs this function for us with some considerations required:
* `as.character()` conversions tend to succeed without fault.
* `as.numeric()` - `TRUE` and `FALSE` become `1` and `0`, character types need to be formatted correctly.
* `as.logical()` - all numeric values except `0` become `TRUE`, character values can be upper, lower, or proper case versions.
Have a look and click 'Run Code' below to see the output.
```{r foundations-type-conversion, exercise=TRUE}
as.character(123.5)
as.numeric("123.5")
as.logical("False")
```
</br>
#### Operators
These operators are common to most programming languages. The table shows the appropriate R syntax in the ‘operator’ column and are shown in order of operation/precedence. In terms of precedence, it is advisable to use brackets to avoid anything ambiguous.
```{r, echo=FALSE}
operators_table <- data.frame(
"Precedence" = c(1, 2, 3, 4, 5, 6, 7, 8),
"Operator" = c("`^`", "`%%`", "`*` `/`", "`+` `-`", "`<` `>` `<=` `>=` `==` `!=`", "`!`", "`&` `&&`", "<code>|</code> <code>||</code>"),
"Description" = c("Exponentiation (right to left)", "Modulus", "Multiplication, Division", "Addition, Subtraction", "Comparison Operators (Less Than, More Than, Less Than or Equal To, More Than or Equal To, Equal To, Not Equal To", "Logical NOT", "Logical AND", "Logical OR")
)
knitr::kable(operators_table)
```
</br>
### Knowledge Check
```{r foundations-basics-quiz}
quiz(
question("What would `typeof(as.logical(0))` return and what is its value?",
answer("Numeric - `0`"),
answer('Character - `"0"`'),
answer("Logical - `TRUE`"),
answer("Logical - `FALSE`", correct = TRUE),
incorrect = "Not quite, have another go!",
allow_retry = TRUE,
random_answer_order = TRUE
),
question("If `x = 5`, what does this return: `x < 10 || x == 4`?",
answer("TRUE", correct = TRUE),
answer("FALSE"),
incorrect = "Not quite, have another go!",
allow_retry = TRUE,
random_answer_order = TRUE
),
question("What makes an **invalid** name for a variable?",
answer("Starting with an underscore (`_`)", correct = TRUE),
answer("Starting with a dot (`.`)"),
answer("Symbols other than an underscore (`_`) or dot (`.`)", correct = TRUE),
answer("Starting with a number", correct = TRUE),
answer("Reserved names, e.g. `TRUE`", correct = TRUE),
incorrect = "Not quite, have another go!",
allow_retry = TRUE,
random_answer_order = TRUE
)
)
```
## Data Structures
This is a list of the main data structures (complex data types) available in R:
* **Vectors** contain multiple objects of the same basic class
* **Lists** are a special type of vector that can contain objects of *different* basic classes, including other lists
* **Matrices** expand the dimensions with `nrow` (number of rows) and `ncol` (number of columns) arguments, these are constructed column-wise
* **Factors** are used to represent categorical data
* **Data Frames** store tabular data, each column contains one variable, each row contains an observation
It's possible to use `str()` to get an overview and description of the data structure
#### Vectors
* Create: `vector("<data_type>", <length>)` or `c(...)`
* Access: `<vector_name>[<index>]`
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-vector, exercise=TRUE}
vector("logical", 4)
c("a", "c", "f", "b")[1]
c(2, 5, 1, "abc")[2]
```
</br>
#### Lists
* Create: `list(...)`
* Sub-list: `<list_name>[<index>]`
* Access: `<list_name>[[<index>]]`
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-list, exercise=TRUE}
list("abc", 4, FALSE, 2.5)[1:2]
list(list(2, 3), "abc")[2]
```
</br>
##### Naming Lists
It's possible to name list items, this can be done during creation or after with the `names()` function.
* At creation: `list("<name>", = <item>)`
* After creation: `names(<list_name>) <- c("<name>")`
This adds the ability to access list items with the `$` operator and using the list item's name.
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-list-naming, exercise=TRUE}
x <- list("Ch" = "a", "Nm" = 2)
names(x) <- c("Char", "Num")
x$Char
```
</br>
#### Matrices
* Create: `matrix(<data>, nrow = <int>, ncol = <int>)`
* Access: `<matrix_name>[<row_num>, <col_num>]`
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-matrix, exercise=TRUE}
x <- matrix(1:6, 2, 3)
x
x[2, 3]
```
</br>
#### Factors
* Create: `factor(c(...))`
* Levels: `factor(c(...), levels = c(...))`
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-factors, exercise=TRUE}
factor_ex <- factor(c("low", "high", "medium", "high", "low", "medium", "high"))
factor_ex
factor_ex <- factor(factor_ex, levels = c("low", "medium", "high"))
levels(factor_ex)
```
</br>
#### Data Frames
* Create: `data.frame("<name>" = <element(s)>)`
* Subset: `[]`
* Access: `[[]]` or `$`
Have a look and click 'Run Code' below to see the output.
```{r foundations-structures-dataframe, exercise=TRUE}
data.frame(name = c("Harry", "Sarah"), score = c(62, 91))
```
### Knowledge Check
```{r foundations-structures-quiz}
quiz(
question("What will `c('abc', 5, TRUE, 123.5)[3]` return?",
answer("`TRUE`", message = "This is a vector so only holds one basic data type."),
answer("`5`"),
answer("`123.5`", message = "R has a 1-based indexing system and this is a vector so only holds one basic data type."),
answer("`'123.5'`", message = "R has a 1-based indexing system."),
answer("`'TRUE'`", correct = TRUE),
allow_retry = TRUE,
random_answer_order = TRUE
),
question("How do you return the second element from this list (not a sub-list)? (Select all that apply) `example_list <- list('Number' = 123, 'List' = list(1, 2, 3), TRUE, FALSE)`",
answer("`example_list[2]`", message = "Remember it's the element to be returned, not a sub-list"),
answer("`example_list[[2]]`", correct = TRUE),
answer("`example_list$List`", correct = TRUE),
answer("`c(1, 2, 3)`"),
allow_retry = TRUE,
random_answer_order = TRUE
),
question("What result did Zac get? `patient_list <- data.frame(name = c('Tom', 'Jen', 'Zac', 'Kat'), \n result = c(97.6, 54.3, 21.0, 83.8))`",
answer("97.6"),
answer("54.3"),
answer("21.0", correct = TRUE),
answer("83.8"),
allow_retry = TRUE
),
question("How do you get the 2nd, 3rd, and 4th item from this list? `example_list <- list('x', 'a', 'b', 'c', 'y', 'z')`",
answer("`example_list[2:4]`", correct = TRUE),
answer("`example_list[1:3]`", message = "R has a 1-based indexing system"),
answer("`example_list[[2:4]]`"),
answer("`example_list[[1:3]]`"),
random_answer_order = TRUE,
allow_retry = TRUE
)
)
```
## Functions & Packages
#### Anatomy of a Function
It's a good point here to have a look at how a function is created. This is at a high level and just to help understand what is happening when you utilise functions.
Functions, at a basic level, allow us to bundle code for reuse, taking inputs, doing something and providing outputs.
```
<name> <- function(<inputs>){
<code>
return(<outputs>)
}
```
The above shows a basic template of a function, below is a basic function. Have a look and determine what you think happens. Run the code and see if the output matches your expectations. Feel free to play around with the function code.
```{r foundations-function-anatomy, exercise=TRUE}
mult_2 <- function(x){
x <- x * 2
return(x)
}
mult_2(4)
```
</br>
### Packages
On top of base R are packages, these are bundles of code to expand the functionality of base R. This allows people to write and share functions that share expertise. There are over 4000 packages available on CRAN (Comprehensive R Archive Network) and more available outside of this network. PHS have their own suite of packages too, `phsverse`, including `phsmethods` (a bundle of functions that are common for PHS staff) and `phsopendata` (functions for interacting with the Scottish Health and Social Care Open Data platform).
It's on top of both base R and packages that user codes sits. Not all user code make use of packages but it can make writing code easier and improve efficiency.
```{r foundations-rstructure, echo=FALSE, fig.align='center', out.width="100%"}
knitr::include_graphics("images/r-foundations-structure.png")
```
To use packages, they must first be installed on your system. This only needs to be done once, it's like installing new software on your computer. Packages are installed by using `install.packages("<package_name>")`.
To utilise the functions within a package, they have to be loaded for each new session of R. This is done with `library(<package_name>)`. *Another function that loads packages is `require(<package_name>)`, however, this will not halt the execution of a program where an error occurs and is typically only used within other functions with supporting code.*
When loading a package, it is common that a warning (not an error) will be thrown. This warning is when a function within the package has the same name as another function already available in the R session (potentially from base R or another package). The most recent package's function will "mask" the other. To be explicit on the function to use, you can use `<package_name>::<function>()`, e.g. `dplyr::filter()`.
</br>
#### Tidyverse
<div class = "tidyverse-logo">
```{r packages-tidyverse, echo=FALSE, fig.align='center', out.width="60%"}
knitr::include_graphics("images/r-packages-tidyverse.png")
```
</div>
Tidyverse is a suite of packages and have a common theme throughout (and an opinionated style guide which we have adopted in PHS). We’ll look at a couple of the Tidyverse packages later in [Wrangle]. It is not recommended to load Tidyverse in its entirety though, but rather each package that you’ll specifically use (e.g. `dplyr`). Tidyverse has grown dramatically so loading the whole suite will unnecessarily utilise a lot of resource.
</br>
While it's not possible to install packages on this course page, you can load them, dplyr has been pre-loaded for you and the below is a typical output.
```{r dplyr-load, eval=FALSE, include=TRUE, echo=TRUE}
library(dplyr)
```
```{r dplyr-load-output, eval=FALSE, include=TRUE, echo=TRUE}
Attaching package: 'dplyr'
The following objects are masked from 'package:stats':
filter, log
The following objects are masked from 'package:base':
intersect, setdiff, setequal, union
```
You can see that `dplyr` has the same function names as in a package called `stats`, `filter()` and `log()`, and from `base`, `intersect()`, `setdiff()`, `setequal()`, and `union()`. As `dplyr` was the latest package to be loaded, the functions from `dplyr` "mask" the functions from the other packages.
</br>
#### PHSverse
PHS have developed a suite of packages, [`phsverse`](https://github.com/Public-Health-Scotland/phsverse), with the aim to tackle some of the most common tasks, e.g. dealing with CHI numbers and following our brand guidelines in visualisations. The included packages in [`phsverse`](https://github.com/Public-Health-Scotland/phsverse) are outlined below.
* `phsmethods` - functions for common analytical tasks
* `phsopendata` - functions to extract and interact with data from the Scottish Health and Social Care Open Data platform
* `phsstyles` - functions for standard graphic styling
* `phstemplates` - standard R templates and projects
## Control Flow & Iteration
In programming, control flow is where decisions are made and iteration is where things are repeated. This is the foundation of making a program work for us and are likely to be used within the functions we just looked at.
### Control Flow - if
* Package: `dplyr`
* `if_else(<condition>, <true>, <false>)`
```{r cfi-if, exercise=TRUE}
x <- 5
if_else(x > 5, "`x` is more than 5", "`x` is less than or equal to 5")
```
### Control Flow - case
* Package: `dplyr`
* `case_when(<condition> ~ <result>)`
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_dplyr_case_when.png")
```
</div>
```{r cfi-case, exercise=TRUE}
x <- c(1, 2, 3, 4, 5)
case_when(x < 3 ~ "LT3",
x %% 2 == 0 ~ "Even")
```
### Iteration - for
* `for(<value> in <sequence>) {<statement>}`
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_for_loop.png")
```
</div>
```{r cfi-for, exercise=TRUE}
x <- c(1, 2, 3, 4, 5)
even_count <- 0
for(i in x) {
if(i %% 2 == 0)
even_count <- even_count + 1
}
print(even_count)
```
### Iteration - purrr loop
* Package: `purrr`
* `map(<object>, <function>)`
```{r cfi-purrr, exercise=TRUE}
add_10 <- function(x) {
x <- x + 10
}
x <- c(1, 2, 3, 4, 5)
map(x, add_10)
```
### Knowledge Check
Have a look at the code below, then answer the quiz questions.
```{r, eval=FALSE, echo=TRUE}
if (number < 10) {
if (number < 5) {
result <- "extra small"
} else {
result <- "small"
}
} else if (number < 100) {
result <- "medium"
} else {
result <- "large"
}
```
```{r if-else-quiz}
quiz(
question("Which 2 statements are true?",
answer('If `number` = `6`, `result` = "small".', correct = TRUE),
answer('If `number` = `100`, `result` = "medium".'),
answer('If `number` = `4`, `result` = "extra small".', correct = TRUE),
answer('If `number` = `2500`, R will generate an error.'),
incorrect = "Not quite, have another go!",
allow_retry = TRUE
)
)
```
## RStudio
RStudio is the software we use to write and run R code, also known as an Integrated Development Environment (IDE). In PHS we have the [RStudio Server](https://rstudio.nhsnss.scot.nhs.uk/) which takes the processing strain away from your local machine and prevents the need to pull data to process or run analyses. Work is ongoing in the background to improve the infrastructure, maintain the highest security but also allowing new and innovative platforms for sharing our products and services.

</br>
#### Guidance
* **Close Sessions** - leaving sessions open or running multiple sessions uses resource. As the server is a shared resource, this can the whole organisation. When finished, close it down using the red power button at the top right.
* **Store Data Properly** - projects and files shouldn’t be stored in your ‘home’ drive and unused data should be removed from your environment.
* **Retrieve Data Efficiently** - limit the data extracted using well written SQL.
</br>
#### R Projects
Using RStudio also gives us the use of R Projects. Using R Projects keeps our projects separate, giving the project its own working directory, workspace, and history.
Opening an .Rproj file will:
* Start a new R session
* Load project specifics and related settings
* The project directory is set as the current working directory
<div class = "supporting-image-left">
```{r rstudio-newproj, echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/r-rstudio-newproj.png")
```
</div>
To create a new R project, open this dialogue box by going to 'File > New Project...' or click 'Create a project' in the global toolbar.
* **New Directory** - creates a new directory/folder in the place you specify with the .Rproj file inside.
* **Existing Directory** - takes an existing directory/folder and associates the project there.
* **Version Control** - this takes a project from version control (such as one on GitHub) and creates the directory/folder in the place you specify.
</br>
## Data Flow
The chances are, that when you're working in R, you'll be working with some kind of data-set. The typical workflow involves bringing that data in, exploring, wrangling, creating visualisations, and then creating some kind of output. This workflow is outlined below with important foundations for projects to follow.
```{r workflow, echo=FALSE, fig.align='center', out.width="100%"}
knitr::include_graphics("images/r-workflow.png")
```
*Unfortunately, it's not possible have the code exercises in this course reach directories for reading and writing code. To test these functions for yourself, the data we'll be working with is available [here](https://github.com/Public-Health-Scotland/learnr-intro/tree/master/data) on GitHub. The data will be pre-loaded for you in later exercises.*
### Working Directory
The working directory is the current directory/file associated with your project. Working directories allow for more efficient file paths that are *relative* to your working directory. This is associated with the use of [R Projects](https://r4ds.had.co.nz/workflow-projects.html).
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_cracked_setwd.png")
```
</div>
Base R has it's own functions for navigating your working directory:
* Current working directory: `getwd()`
* Set new working directory: `setwd("<filepath>")`
However, the recommended method is **using the `here` package**. This gives us the `here()` function which works similar to `getwd()` but is more forgiving in how it searches for files and directories.
RStudio also provides options through the user interface for navigating directories. Also, other commands, such as `ls()` are available in R.
### CSV
Working with CSVs requires the use of an R package, the recommended package to use is `readr`.
* Read: `read_csv("<filepath>")`
* Write: `write_csv(<object>, "<filepath>")`
```{r csv-read, eval=FALSE, include=TRUE, echo=TRUE}
library(readr)
borders_csv <- read_csv("data/Borders.csv")
```
### SPSS
Working with SPSS files requires the use of an R package, the recommended package to use is `haven`.
* Read: `read_sav("<filepath>")`
* Write: `write_sav(<object>, "<filepath>")`
```{r spss-read, eval=FALSE, include=TRUE, echo=TRUE}
library(haven)
borders_spss <- read_sav("data/Borders.sav")
```
### RDS
RDS (R Data Single) stores a single R object.
* Read: `readRDS("<filepath>")`
* Write: `saveRDS(<object>, "<filepath>")`
```{r rds-read, eval=FALSE, include=TRUE, echo=TRUE}
borders_rds <- readRDS("data/borders.rds")
```
### Web
The packages/functions used will vary depending on the structure of the data hosted on the web. This example uses a CSV so the process in very similar to before.
* Read: `read_csv("<filepath>")`
```{r web-read, eval=FALSE, include=TRUE, echo=TRUE}
library(readr)
borders_csv <- read_csv("https://www.opendata.nhs.scot/dataset/cbd1802e-0e04-4282-88eb-d7bdcfb120f0/resource/c698f450-eeed-41a0-88f7-c1e40a568acc/download/current_nhs_hospitals_in_scotland_010720.csv")
```
### Databases (SMRA)
The packages/functions used will vary depending on the structure of the database. This example, for SMRA, uses the package `odbc`.
* Connect (running this will then prompt you for your user credentials):
``` {r smra-connect, eval=FALSE, include=TRUE, echo=TRUE}
smra_connection <- dbConnect(drv = odbc(),
dsn = "SMRA",
uid = .rs.askForPassword("SMRA Username:"),
pwd = .rs.askForPassword("SMRA Password:"))
```
* Extract:
``` {r smra-extract, eval=FALSE, include=TRUE, echo=TRUE}
smr01 <- dbGetQuery(smra_connection, paste("<sql_query>"))
```
## Explore
### Mean/Median & Summary
* `mean()` and `median()` are passed arrays of values (usually from a data frame) to return the mean and median value.
* `summary()` returns all summary statistics based on a given array.
You now have the borders data-set loaded as `borders_data`. See if you can get the mean value for `LengthOfStay`. Use the hint button if you need some help.
```{r mean, exercise=TRUE, exercise.eval=TRUE}
borders_data
```
```{r mean-hint-1}
mean(borders_data$...)
```
```{r mean-solution}
mean(borders_data$LengthOfStay)
```
```{r mean-check}
grade_code()
```
### Frequencies & Crosstabs
* Frequency: `table(<df_name>$<col_name>)`
* Crosstab: `table(<df_name>$<col_name1>, <df_name>$<col_name2>)`
* Add Col/Row Totals: `addmargins()`
Create a crosstab for `HospitalCode` and `Sex`, add column and row totals. Use the hint button if you need some help.
```{r freq, exercise=TRUE}
```
```{r freq-hint-1}
...(table(...))
```
```{r freq-hint-2}
addmargins(table(...))
```
```{r freq-hint-3}
addmargins(table(borders_data$..., borders_data$...))
```
```{r freq-solution}
addmargins(table(borders_data$HospitalCode, borders_data$Sex))
```
```{r freq-check}
grade_code()
```
## Wrangle - Part 1
<div class = "tidyverse-logo">
```{r packages-dplyr, echo=FALSE, fig.align='center', out.width="60%"}
knitr::include_graphics("images/r-packages-dplyr.png")
```
</div>
In [Packages] we spoke about Tidyverse, a suite of packages for data exploration, manipulation, and visualisation. Within the suite of packages, we get `dplyr`, the grammar of data manipulation. This package provides as with a set of "verbs" to help solve most data manipulation challenges:
</br>
`filter()` `mutate()` `arrange()` `select()` `group_by()` `summarise()` `count()` `rename()` `recode()`
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_tidydata.jpg")
```
</div>
### Pipe Operator
Before going on to the functions that make up `dplyr`, we need to talk about the pipe `%>%` (shortcut: `ctrl` + `shift` + `M`). This is used to link functions together, passing the result of the previous into the next. Using the pipe makes R code more readable and prevents extensive parenthesis building up with multiple function calls. An example is below, you can see the results are the same. Don't worry too much about the functions, we get into that next (`dplyr` has been loaded).
```{r pipe-example, exercise=TRUE}
# Not using pipe operator
arrange(filter(borders_data, HospitalCode == "B120H"), Dateofbirth)
# With pipe operator
borders_data %>%
filter(HospitalCode == "B120H") %>%
arrange(Dateofbirth)
```
In each example for all the following `dplyr` functions, the pure function will be given first and then the second point will be how it's used with a pipe.
### Filter
Picks cases based on their values:
* `filter(<data>, <logical_expression>)`
* With pipe: `<data> %>% filter(<logical_expression>)`
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_dplyr_filter.jpg")
```
</div>
Filter the `borders_data` data-set to where `HospitalCode` is "B120H" and `LengthOfStay` is more than 10. Use the hint button if you need some help.
```{r filter-example, exercise=TRUE}
```
```{r filter-example-hint-1}
borders_data %>%
...(... == ... & ... > ...)
```
```{r filter-example-hint-2}
borders_data %>%
filter(... == ... & ... > ...)
```
```{r filter-example-hint-3}
borders_data %>%
filter(HospitalCode == ... & LengthOfStay > ...)
```
```{r filter-example-solution}
borders_data %>%
filter(HospitalCode == "B120H" & LengthOfStay > 10)
```
```{r filter-example-check}
grade_result(
pass_if(~ identical(as.character(.result$URI[1]), "28") & identical(as.character(.result$URI[1000]), "12598")),
fail_if(~ identical(as.character(.result$URI[1]), "4"), "Did you forget to include LengthOfStay?"),
fail_if(~ identical(as.character(.result$URI[10]), "138"), "Did you forget to include HospitalCode?"),
fail_if(~ identical(as.character(.result$URI[1000]), "10838"), "Did you filter to include where LengthOfStay is 10 or more? We're looking for more than 10 days."),
fail_if(~ TRUE)
)
```
### Mutate
Adds new variables that are functions of existing variables:
* `mutate(<data>, <new-col> = <expression>)`
* With pipe: `<data> %>% mutate(<new-col> = <expression>)`
<div>
```{r echo=FALSE, fig.align='center', out.width="75%"}
knitr::include_graphics("images/illustrations/r_dplyr_mutate.png")
```
</div>
Create a new column in `borders_data` that is equal to `LengthOfStay` divided by 2. Name this column `los_div2`. Use the hint button if you need some help.
```{r mutate-example, exercise=TRUE}
```
```{r mutate-example-hint-1}
borders_data %>%
...(... = ...)
```
```{r mutate-example-hint-2}
borders_data %>%
mutate(... = ...)
```
```{r mutate-example-hint-3}
borders_data %>%
mutate(los_div2 = ... / 2)
```
```{r mutate-example-solution}
borders_data %>%
mutate(los_div2 = LengthOfStay / 2)
```
```{r mutate-example-check}
grade_result(
pass_if(~ identical(as.character(.result$los_div2[1]), "2") & identical(as.character(.result$los_div2[1000]), "4.5")),
fail_if(~ TRUE)
)
```
### Arrange
Orders rows in ascending order:
* `arrange(<data>, <variables>)`
* `arrange(<data>, desc(<variables>))` to sort in descending order
* With pipe: `<data> %>% arrange(<variables>)`
Sort `borders_data` by `HospitalCode`. Use the hint button if you need some help.
```{r arrange-example, exercise=TRUE}
```
```{r arrange-example-hint-1}
borders_data %>%
...(...)
```
```{r arrange-example-hint-2}
borders_data %>%
arrange(...)
```
```{r arrange-example-solution}
borders_data %>%
arrange(HospitalCode)
```
```{r arrange-example-check}
grade_result(
pass_if(~ identical(as.character(.result$URI[1]), "1763") & identical(as.character(.result$URI[1000]), "19971")),
fail_if(~ identical(as.character(.result$URI[1]), "7503"), "Did you arrange in descending order?"),
fail_if(~ TRUE)
)
```
### Select
Picks variables based on their names:
* `select(<data>, <variable>)`
* With pipe: `<data> %>% select(<variable>)`
* pre-pend `-` to a variable to remove
Remove the `Postcode` variable from `borders_data`. Use the hint button if you need some help.
```{r select-example, exercise=TRUE}
```
```{r select-example-hint-1}
borders_data %>%
...(...)
```
```{r select-example-hint-2}
borders_data %>%
select(...)
```
```{r select-example-hint-3}
borders_data %>%
select(-...)
```
```{r select-example-solution}
borders_data %>%
select(-Postcode)
```
```{r select-example-check}
grade_result(
pass_if(~ (!"Postcode" %in% colnames(.result)) & (c("URI", "HospitalCode", "Specialty", "MOP", "Main_Condition", "Main_op", "Dateofbirth", "DateofAdmission", "DateofDischarge", "Sex", "LinkNo", "LengthOfStay", "HBRes") %in% colnames(.result))),
fail_if(~ TRUE)
)
```
### Knowledge Check