-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy path02-prac2.Rmd
1279 lines (892 loc) · 63.9 KB
/
02-prac2.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
# Introduction to R
## Learning outcomes
By the end of this practical you should be able to:
1. Execute basic processing in R
2. Examine, clean and manipulate comma seperate value (```.csv```) data
3. Examine, clean and manipulate and plot spatial (```.shp```) data
4. Produce interactive maps
5. Evaluate the benefits of different data manipulation and mapping techniques
## Recommended listening
Some of these practicals are long, take regular breaks and have a listen to some of our fav tunes each week.
[Andy](https://www.youtube.com/watch?v=A48hOToMuRE)
[Adam](https://open.spotify.com/album/0m9oaRdpRS9n0mq2bECgNQ?si=Dxu8_-kmSy6jq_oFN5gJ4A). Recommended listening this week comes courtesy of “Pound for pound the best rock band on the planet at the minute. In my, not so humble opinion.” – Jake Burns, Stiff Little Fingers. Yes, it’s the Wildhearts and I can confirm that they are the greatest rock band on the planet! Wrap your ears around this:
## Introduction
This practical is **LONG** but it will take you from not knowing much about R to making freaking cool interactive maps in one practical. As you can imagine, this will be a steep learning curve.
I will give you all the code you need, it’s your job to read through the text very carefully and try to understand what bits of code are doing as you go.
There will be bits of code you don’t fully understand. Don’t worry, the key is to revisit later and try to work out what is going on then. Learning R is a long and iterative process and this is just the start…
If you want to learn more about R and indeed download the latest version for your own use, then visit the [R project pages](http://www.r-project.org/)
The Wikipedia page for those who want to know a little of the history of R can be found [here](http://en.wikipedia.org/wiki/R_(programming_language))
There is an almost endless supply of good R tutorials on the web. If you get stuck or want to learn even more R (and why would you not want to?!), I’d recommend trying some of the following R Tutorial websites:
* http://www.statmethods.net/index.html
* http://www.r-tutor.com/
* http://www.cyclismo.org/tutorial/R/index.html
* http://www.cookbook-r.com/
If you want to really be up to date with the state of the art in R, then [bookdown](https://bookdown.org/) is a fantastic resource. It features **free** books by some of the pre-eminent names in the R scene --- I would urge you to go and take a look.
### Online forums are your friend!!
With almost every problem you encounter with R, someone else will have had the same problem before you and posted it on a forum --– someone will then post a solution below.
My usual route is to Google the problem and I’ll then be directed to a post, usually on Stack Overflow, Stack Exchange or Cross Validated. When doing so try to think about the minimal working (or not working) example (MWE), by this i mean remove anything very specific to your problem. I’ve rarely not found a solution to a problem this way.
### Health warning
Beware of posting questions on these forums yourself – contributors to these forums (especially the R ones!), whilst almost always extremely knowledgeable about R, have a bit of a reputation for being insert familiar pejorative term for less-than-polite-human-being here! As you might expect, people who have enough time to become total experts in R, have little time to work on their social skills!! Fortunately though, some other poor chump has usually taken that hit for you and you can still find a useful answer to your problem.
If you are specifically more interested in the spatial side of R, then Alex Singleton and Chris Brunsdon at the Universities of Liverpool and Maynooth also have a number of very useful R Spatial Tutorials – http://rpubs.com/alexsingleton/ & http://rpubs.com/chrisbrunsdon/
Robin Lovelace in Leeds is also frequently at the bleeding edge of developments in R spatial stuff, so keep an eye on his [website](http://robinlovelace.net/). Robin has also made a book on GeoComputation in R, which you should definitely read! --- https://geocompr.robinlovelace.net/
These websites are also very very good: https://pakillo.github.io/R-GIS-tutorial/ and http://www.maths.lancs.ac.uk/~rowlings/Teaching/UseR2012/cheatsheet.html
### R and RStudio
When you download and install R, you get the R Graphical User Interface (GUI) as standard (below). This is fine and some purists prefer using the clean, unfussy command-line original, but it has some limitations such as no graphical way to view data tables or keep track of what is in your working directory (there are a number of others too).
```{r echo=FALSE, out.width = "500px", fig.align='center', cache=TRUE}
knitr::include_graphics('prac2_images/R.png')
```
Fortunately there are a number of software environments that have been developed for R to make it a little more user-friendly; the best of these by a long way is RStudio. RStudio can be downloaded for free from https://www.rstudio.com/. We covered the RStudio layout last week.
### Getting started
If you are some kind of masochist, you are welcome to use the bundled R GUI for all of your work. If pain is not your thing, then for this practical (and future practicals) I will assume that you are using RStudio.
1. From the start menu on your computer, find and run R Studio
Once RStudio has opened, the first thing we will do is create a new project – projects enable you to organise your work effectively and store all of the files you create and work with for a particular task.
2. To create a new project (and this will vary a little depending on the version of RStudio you are using) select File > New Project
3. Select Start a project in a brand new working directory and create a new project in a directory of a new ‘wk2’ directory on your N: drive:
```{r echo=FALSE, out.width = "500px", fig.align='center', cache=TRUE}
knitr::include_graphics('prac2_images/r_newproject.png')
```
My file directory (the second box here) will be different to yours as this is my teaching resources folder. Keep yours simple ```N:/GIS/wk2```.
Setting up a project is extremely useful as it lets you easily access your data and files...for example....the flytipping ```.csv``` we used last week is stored at the file path
```{r eval=FALSE, cache=TRUE}
mycsv <- read_csv("C:/Teaching/CASA0005repo/Prac1_data/fly_tipping_borough_edit.csv")
```
However as i've set my R project up in the CASA0005repo folder with different data folders for each week i can just use:
```{r eval=FALSE, cache=TRUE}
mycsv <- read_csv("Prac1_data/fly_tipping_borough_edit.csv")
```
If i had the ```.csv``` file in the same folder as my project i could just use
```{r eval=FALSE, cache=TRUE}
mycsv <- read_csv("fly_tipping_borough_edit.csv")
```
You can run this in the Console area now or within a script which we will now go over...
### Basics
1. R has a very steep learning curve, but hopefully it won’t take long to get your head around the basics. For example, at its most simple R can be used as a calculator. In the console window (bottom left), just type the following and press enter:
```{r cache=TRUE}
1+5
```
or
```{r cache=TRUE}
4*5^2
```
As you can see R performs these calculations instantly and prints the results in the console. This is useful for quick calculations but less useful for writing scripts requiring multiple operations or saving these for future use.
2. To save your scripts, you should create a new R Script file. Do this now: Select File > New File > R Script.
3. The R Script should open up on the top-left of your GUI. **From now on type everything in this R script file and save it**
### Scripts and some basic commands
4. Usually one of the first things to do when starting a new R Script is to check that you are in the correct working directory. This is important especially if you are working on multiple projects in different locations. To do this type the following into your new empty R Script:
```{r cache=TRUE}
getwd()
```
5. To run this line, hold Ctrl (Cmd on a Mac) and press the Return(↲) key (if you are in the standard R installation, you would run your script with Ctrl R). You should now see your current working directory appear in the console.
6. Because of the new project we have already set up, this working directory should be correct, but if for any reason we wanted to change the working directory, we would use the ```setwd()``` function. For example, we wanted to change our directory to the documents folder on the C drive, we could run (don’t do this now):
```{r eval=FALSE, cache=TRUE}
setwd("C:/Documents")
```
7. When we are sure we are working in the correct working directory, we can save our script by clicking on the save icon on the script tab. Save your script as something like “wk2_part1” and you will see it appear in your files window on the right hand side. As you build up a document of R code, you should get into the habit of saving your script periodically in case of an unexpected software crash.
8. We can now begin to write a script without the need to run each line every time we press enter. In the script editor type:
```{r cache=TRUE}
A <- 1
B <- 2
C <- A+B
C
```
9. Select (highlight) the three lines and run all three lines with Ctrl Return(↲). You will notice the lines appear in the console (the other window). If you type C and press enter in the console (C and then ctrl return in the script window) you should have the number 3 appear. From now on I recommend you type all the commands below in the script first and then run them. Copying and pasting from this document won’t necessarily work.
10. You will also notice that in RStudio, values A, B and C will appear in your workspace window (top right). These variables are stored in memory for future use. Try giving A and B different values and see what happens. What about if you use lower case letters?
11. You have just demonstrated one of the powerful aspects of R, which is that it is an **object oriented** programming language. A, B and C are all objects that have been assigned a value with the <- symbol (you can also use the = sign, but it operates slightly differently to <- in R, plus the arrow assignment has become standard over the years. Use **alt -** to type it automatically). This principle underlies the whole language and enables users to create ever more complex objects as they progress through their analysis. If you type:
```{r}
ls()
```
R will produce a list of objects that are currently active.
```{r cache=TRUE}
rm(A)
```
will remove the object A from the workspace (do ```ls()``` again to check this or look in your workspace window).
### Functions
12. Both ```rm()``` and ```ls()``` are known as functions. Functions are the other fundamental aspect to the R language. Functions can be thought of as single or multiple calculations that you apply to objects. They generally take the form of...(don't run these)
```{r, eval=FALSE, cache=TRUE}
function(object, argument1, argument2, argument3)
```
Where the object is some form of data and the arguments parameterise what the function will do.
You could save the ouput to a new object using something like...
```{r, eval=FALSE, cache=TRUE}
X<-function(object, argument1, argument2, argument3)
```
13. You can write your own functions to carry out tasks (and we’ll come onto that in subsequent practical sessions), but normally you will just used one of the virtually infinite number of functions that other people have already written for us.
### Basic plotting
One common function is the ```plot()``` function for displaying data as a graphical output. Add these lines to your script and run them as before and you can see some ```plot()``` outputs:
```{r, cache=TRUE}
#create some datasets, first a vector of 1-100 and 101-200
Data1 <- c(1:100)
Data2 <- c(101:200)
#Plot the data
plot(Data1, Data2, col="red")
```
```{r, cache=TRUE}
#just for fun, create some more, this time some normally distributed
#vectors of 100 numbers
Data3 <- rnorm(100, mean = 53, sd=34)
Data4 <- rnorm(100, mean = 64, sd=14)
#plot
plot(Data3, Data4, col="blue")
```
14. In the code above, you will have noticed the ```#``` symbol. This signifies that whatever comes after it on that line is a comment. Comments are ignored by the R console and they allow you to annotate your code so that you know what it is doing. It is good programming practice to comment your code extensively so that you can keep track of what your scripts are for.
**Warning** Heed our advice now and comment your code it will save you time in the future!
### Help
15. The previous lines of code also demonstrated a number of functions: ```c()``` concatenates a string of numbers together into a vector. 1:100 means produce the integers between and including 1:100, the ```plot()``` function plots the two data objects and includes a parameter to change the colour of the points. To understand what a function does, you can consult the R Help system. Simply type a question mark and then the function name; for example:
```{r eval=FALSE, cache=TRUE}
?plot
```
```{r echo=FALSE, out.width = "300px", fig.align='center', cache=TRUE}
knitr::include_graphics('prac2_images/rhelp.png')
```
16. In RStudio you will see the help file appear in the Help window in the bottom right of the GUI. Here you can also search for the help files for other functions in the search bar.
### Data types
17. Objects in R can exist as a number of different data types. These include a **matrix**, a **vector**, a **data frame** and a **list**. For the purposes of this practical we will focus on data frames. These are the most flexible data format in R (although tibbles are now becoming popular as well). Data frames can be conceptualised in a similar way to a spreadsheet with data held in rows and columns. They are the most commonly used object type in R and the most straightforward to create from the two vector objects we just created.
```{r, cache=TRUE}
df <- data.frame(Data1, Data2)
plot(df, col="green")
```
18. If you have a very large data frame (thousands or millions of rows) it is useful to see only a selection of these. There are several ways of doing this:
```{r, cache=TRUE}
#show the first 10 and then last 10 rows of data in df...
head(df)
```
```{r, cache=TRUE}
tail(df)
```
You can also view elements of your data frame in RStudio by simply clicking on it in the top-right Environment window:
```{r echo=FALSE, out.width = "800px", fig.align='center', cache=TRUE}
knitr::include_graphics('prac2_images/dataview.png')
```
### Elements of a data frame
19. When programming you will frequently want to refer to different elements in a data frame or a vector/list. To select elements of a data frame, or subset it, you can refer specifically to ranges or elements of rows and columns. These are accessed using the single square bracket operator [], with the form:
```{r eval=FALSE, cache=TRUE}
data.frame[row,column]
```
Rows are always referenced first, before the comma, columns second, after the comma.
20. Try the subsetting your df data frame with the following commands to see what is returned:
```{r cache=TRUE}
df[1:10, 1]
df[5:15,]
df[c(2,3,6),2]
df[,1]
```
21. You will note that the column headings are the names of the original objects creating the data frame. We can change these using the ```colnames()``` function:
```{r, cache=TRUE}
colnames(df)<- c("column1", "column2")
```
To select or refer to these columns directly by name, we can either use the ```$``` operator, which takes the form ```data.frame$columnName```, e.g.
```{r, cache=TRUE}
df$column1
```
or we can use the double square bracket operator [[]], and refer to our column by name using quotes e.g.
```{r, cache=TRUE}
df[["column1"]]
```
This again is useful if you have a lot of columns and you wish to efficiently extract one of them.
## Reading data into R
One of the most tedious things a spatial analyst / data scientist has to do is clean their data so it doesn’t cause problems for the software later. In the past, we would have needed to do this by hand --- these days, we can use software to do much of this for us.
I will now give you two options to arrive at a nice cleaned dataset. If you have issues with software packages etc, you might still need to via the old skool route, however, the new skool route will be much more satisfying if it works!
For this example we are going to use the London Datastore Catalogue.
Go to: https://data.london.gov.uk/dataset/f33fb38c-cb37-48e3-8298-84c0d3cc5a6c and download the excel document for ward profiles.
### Old skool cleaning
22. Open the ```ward-profiles-excel-version.xls``` file in Excel, and save as ```LondonData.csv``` into your wk2/RProject folder.
23. Open your new ```.csv``` file in Excel. There might be some non-numeric values inside numeric columns which will cause problems in your analysis. These need to be removed before proceeding any further. To remove these, you can use the replace function in Excel. In the home tab under ‘Editing’ open up the find and replace dialogue box and enter the following into the find box:
```#VALUE!``` ```n/a```
Leave the replace box empty each time and click Replace All to remove these from your file, before saving the file again.
24. Once you have cleaned out all of the trixy characters from the file, to read it into R, we will use the ```read.csv()``` function:
```{r, cache=TRUE}
LondonDataOSK<- read.csv("prac2_data/ward-profiles-excel-version.csv")
```
> **Note**, I've made an R project for all these practicals, which is why my file path starts with ```prac2_data/```. If you save the ```.csv``` in the same folder as the ```.Rproj``` then you can just use:
```{r eval=FALSE, cache=TRUE}
LondonDataOSK<- read.csv("ward-profiles-excel-version.csv")
```
If you look at the ```read.csv()``` help file - ```?read.csv``` - you will see that we can actually include many more parameters when reading in a .csv file. For example, we could read in the same file as follows:
```{r, cache=TRUE}
# by default in R, the file path should be defined with / but on a #windows file system it is defined with \. Using \\ instead allows R #to read the path correctly – alternatively, just use /
LondonDataOSK<- read.csv("prac2_data/ward-profiles-excel-version.csv",
header = TRUE, sep = ",")
```
This would specify that the first row of the file contains header information; and the values in the file are separated with commas (not ; or : as can be the case sometimes).
### New skool cleaning
To clean our data as we read it in, we are going to use a package (more about packages later --- for now, just think about it as a lovely gift from the R gods) called ```readr``` which comes bundled as part of the ```tidyverse``` package. If you want to find out more about the `tidyverse` (and you really should) then you should start [here](https://www.tidyverse.org/) --- the `tidyverse` package contains almost everything you need to become a kick-ass data scientist. ‘Tidy’ as a concept in data science is well worth reading about and you should start here with Hadley Wickham’s [paper](http://vita.had.co.nz/papers/tidy-data.pdf)
Anyway, first install the package:
```{r, message=FALSE, eval=FALSE, cache=TRUE}
install.packages("tidyverse")
```
Now we can use the ```readr``` package which comes bundled as part of the ```tidyverse``` to read in some data (directly from the web this time --- ```read.csv``` can do this too) and clean text characters out from the numeric columns before they cause problems:
```{r, message=FALSE, cache=TRUE}
library(tidyverse)
#wang the data in straight from the web using read_csv,
#skipping over the 'n/a' entries as you go...
LondonData <- read_csv("https://files.datapress.com/london/dataset/ward-profiles-and-atlas/2015-09-24T14:21:24/ward-profiles-excel-version.csv",
locale = locale(encoding = "latin1"),
na = "n/a")
```
> **Note** the use of read_csv here as opposed to read.csv. They are very similar, but read_csv is just a bit better. Read [this](http://yetanothermathprogrammingconsultant.blogspot.com/2016/12/reading-csv-files-in-r-readcsv-vs.html) for more information. Also, for those python fans out there ---IT’S NOT THE SAME FUNCTION AS READ_CSV IN PYTHON
What is `locale = locale(encoding = "latin1")`...good question...it is basically the encoding of the data (how it is stored). There are a few different formats such as UTF-8 and latin1. In latin1 each character is 1 byte long, in UTF-8 a character can consist of more than 1 byte. To my knowledge the default in R is encoded as latin1, but `readr` (the package we are using to read in the `.csv` is UTF-8 so we have to specify it.
### Examining your new data
25. Your new data has been read in as a data frame / tibble (a tibble is just a data frame with a few extra bells and whistles). If you ever need to check what data type your new data set is, we can use the ```class()``` function:
```{r cache=TRUE}
class(LondonData)
```
```{r, eval=FALSE, cache=TRUE}
# or, if you have your old skool data from step 24 above
class(LondonDataOSK)
```
We can also use the ```class``` function within another two functions ```(cbind() and lapply())``` to check that our data has been read in correctly and that, for example, numeric data haven’t been read in as text or other variables. Run the following line of code:
```{r cache=TRUE}
datatypelist <- data.frame(cbind(lapply(LondonData,class)))
```
You should see that all columns that should be numbers are read in as numeric. Try reading in LondonData again, but this time without excluding the ‘n/a’ values in the file, e.g.
```{r, message=FALSE, cache=TRUE}
LondonData <- read_csv("https://files.datapress.com/london/dataset/ward-profiles-and-atlas/2015-09-24T14:21:24/ward-profiles-excel-version.csv",
locale = locale(encoding = "latin1"))
```
Now run the datatypelist function again --- you should see that some of the columns (those the n/a values in) have been read in as something other than numeric. This is why we need to exclude them. Isn’t ```readr``` great for helping us avoid reading in our numeric data as text!
To show this print the values in the column % children in reception year who are obese - 2011/12 to 2013/14 the column Mean Age - 2013
```{r, message=FALSE, cache=TRUE}
# col has n/a and is not numeric
LondonData$`% children in reception year who are obese - 2011/12 to 2013/14`
# no n/a values in this column
LondonData$`Mean Age - 2013`
```
```{r, eval=FALSE, cache=TRUE}
LondonData <- edit(LondonData)
```
27. It is also possible to quickly and easily summarise the data or look at the column headers using
```{r cache=TRUE}
summary(df)
```
```{r, message=FALSE, cache=TRUE}
names(LondonData)
```
### Data manipulation in R
Now we have some data read into R, we need to select a small subset to work on. The first thing we will do is select just the London Boroughs to work with. If you recall, the Borough data is at the bottom of the file.
#### Selecting rows
29. Your borough data will probably be found between rows 626 and 658. Therefore we will first create a subset by selecting these rows into a new data frame and then reducing that data frame to just four columns. There are a few ways of doing this:
We could select just the rows we need by explicitly specifying the range of rows we need:
```{r cache=TRUE}
LondonBoroughs<-LondonData[626:658,]
```
There is also a ```subset()``` function in R. You could look that up and see whether you could create a subset with that. Or, we could try a cool ‘data sciency’ way of pulling out the rows we want with the knowledge that the codes for London Boroughs start with E09 (the wards in the rest of the file start with E05).
Knowing this, we can use the ```grep()``` function which can use regular expressions to match patterns in text. Let’s try it!
```{r, message=FALSE, warning = FALSE, cache=TRUE}
LondonData <- data.frame(LondonData)
LondonBoroughs <- LondonData[grep("^E09",LondonData[,3]),]
```
Check it worked:
```{r,eval=FALSE, cache=TRUE}
head(LondonBoroughs)
```
**AWWMAHGAWD!!!** Pretty cool hey?
What that function is saying is *“grep (get) me all of the rows from the London Data data frame where the text in column 3 starts with (^) E09”*
You will notice that you will have two rows at the top for the City of London. This is because it features twice in the data set. That’s fine, we can just drop this row from our dataset:
```{r, cache=TRUE}
LondonBoroughs <- LondonBoroughs[2:34,]
```
#### Selecting columns
```{r, cache=TRUE}
LondonBoroughs<-LondonBoroughs[,c(1,19,20,21)]
```
30. You will have noticed the use of square brackets above –-- these are very useful in R. Refer back to points 19-21 above if you can’t remember how they work. The ```c()``` function is also used here --- this is the ‘combine’ function --- another very useful function in R which allows arguments (in this case, column reference numbers) into a single value.
#### Renaming columns
31. You will notice that the column names are slightly misleading as we are now working with boroughs rather than wards. We can rename the columns to something more appropriate using the ```names()``` function (there are various other functions for renaming columns - for example ```colnames()``` if you want to rename multiple columns:
```{r, cache=TRUE}
#rename the column 1 in LondonBoroughs
names(LondonBoroughs)[1] <- c("Borough Name")
```
### Plotting
```{r, cache=TRUE}
plot(LondonBoroughs$Male.life.expectancy..2009.13,
LondonBoroughs$X..children.in.reception.year.who.are.obese...2011.12.to.2013.14)
```
### Pimp my graph!
Now, of course, because this is R, we can pimp this graph using something a bit more fancy than the base graphics functions:
```{r, eval=FALSE, cache=TRUE, message=FALSE, warning=FALSE}
install.packages("plotly")
```
```{r message=FALSE, warning=FALSE}
library(plotly)
plot_ly(LondonBoroughs,
x = ~Male.life.expectancy..2009.13,
y = ~X..children.in.reception.year.who.are.obese...2011.12.to.2013.14,
text = ~LondonBoroughs$`Borough Name`,
type = "scatter",
mode = "markers")
```
### Spatial Data in R
This next part of the practical applies the same principles introduced above to the much more complex problem of handling spatial data within R. In this workshop we will produce a gallery of maps using many of the plotting tools available in R. The resulting maps will not be that meaningful --- the focus here is on sound visualisation with R and not sound analysis (I know one is useless without the other!). Good quality spatial analysis will come in the rest of the module.
Whilst the instructions are step by step you are encouraged to start deviating from them (trying different colours for example) to get a better understanding of what we are doing.
#### Packages
In this section we’ll require even more specialist packages, so I should probably spend some more time explaining what packages actually are! Packages are bits of code that extend R beyond the basic statistical functionality it was originally designed for. For spatial data, they enable R to process spatial data formats, carry out analysis tasks and create some of the maps that follow.
Bascially, without packages, R would be very limited. With packages, you can do virtually anything! One of the issues you will come across is that packages are being continually developed and updated and unless you keep your version of R updated and your packages updated, there may be some functions and options not available to you. This can be a problem, particularly with University installations which (at best) may only get updated once a year. Therefore, apologies in advance if things don’t work as intended!
1. In R Studio all packages can be installed and activated in the ‘Packages’ tab in the bottom-right hand window:
```{r echo=FALSE, out.width = "500px", fig.align='center', cache=TRUE}
knitr::include_graphics('prac2_images/r_packages.png')
```
2. As with everything else in R though, we can also run everything from the command line. The first package we need to install for this part of the practical is ```maptools``` –-- either find and install it using the RStudio GUI or do the following:
```{r, eval=FALSE, cache=TRUE}
install.packages("maptools")
```
There are a few other packages we’ll need to get to grips with. Some, like ```ggplot2``` (one of the most influential R packages ever) are part of the ```tidyverse``` package we came across earlier. Others we will need to install for the first time.
```{r, eval=FALSE, cache=TRUE}
install.packages(c("OpenStreetMap", "classInt", "tmap"))
# might also need these ones
install.packages(c("RColorBrewer", "sp", "rgeos",
"tmaptools", "sf", "downloader", "rgdal",
"geojsonio"))
```
4. Now that the packages have been installed you will not have to repeat the above steps again (when you use your account in these cluster rooms). Open a new script and save it to your working directory as ```wk2_maps.r```. As before, type each of the lines of code into this window and then select and use the ctrl return keys to run them. Be sure to save your script often.
5. The first task is to load the packages we have just installed. Note, you might have some issues with the OpenStreetMap package if your installation of java on your computer doesn’t match your installation of R --– e.g. if you have installed the 64bit version of R, you also need the 64bit version of java (same with the 32bit versions) --- you may also need to install the package Rcpp separately and try again.
Install Java 64-bit from: https://java.com/en/download/manual.jsp
```{r, message=FALSE, warning=FALSE, cache=TRUE}
#Load Packages (ignore any error messages about being built under a
#different R version):
library(maptools)
library(RColorBrewer)
library(classInt)
library(OpenStreetMap)
library(sp)
library(rgeos)
library(tmap)
library(tmaptools)
library(sf)
library(rgdal)
library(geojsonio)
```
#### Background to spatial data in R
R has a very well developed ecosystem of packages for working with Spatial Data. Early pioneers like Roger Bivand and Edzer Pebesma along with various colleagues were instrumental in writing packages to interface with some powerful open source libraries for working with spatial data, such as GDAL and GEOS. These were accessed via the ```rgdal``` and ```rgeos``` packages. The ```maptools``` package by Roger Bivand, amongst other things, allowed Shapefiles to be read into R. The ```sp``` package (along with ```spdep```) by Edzer Pebesma was very important for defining a series of classes and methods for spatial data natively in R which then allowed others to write software to work with these formats. Other packages like ```raster``` advanced the analysis of gridded spatial data, while packages like ```classInt``` and ```RColorbrewer``` facilitated the binning of data and colouring of choropleth maps.
Whilst these packages were extremely important for advancing spatial data analysis in R, they were not always the most straightforward to use --- making a map in R could take quite a lot of effort and they were static and visually basic. However, more recently new packages have arrived to change this. Now ```leaflet``` enables R to interface with the leaflet javascript library for online, dynamic maps. ```ggplot2``` which was developed by Hadley Wickam and colleagues radically changed the way that people thought about and created graphical objects in R, including maps, and introduced a graphical style which has been the envy of other software to the extent that there are now libraries in Python which copy the ```ggplot2``` style!
Building on all of these, the new ```tmap``` (Thematic Map) package has changed the game completely and now enables us to read, write and manipulate spatial data and produce visually impressive and interactive maps, very easily. In parallel, the ```sf``` (Simple Features) package is helping us re-think the way that spatial data can be stored and manipulated. It’s exciting times for geographic information / spatial data science!
#### Making some choropleth maps
Choropleth maps are thematic maps which colour areas according to some phenomenon. In our case, we are going to fill some irregular polygons (the London Boroughs) with a colour that corresponds to a particular attribute.
As with all plots in R, there are multiple ways we can do this. The basic ```plot()``` function requires no data preparation but additional effort in colour selection/ adding the map key etc. ```qplot()``` and ```ggplot()``` (installed in the ```ggplot2``` package) require some additional steps to format the spatial data but select colours and add keys etc automatically. Here, we are going to make use of the new ```tmap``` package which makes making maps very easy indeed.
6. So one mega cool thing about R is you can read spatial data in straight from the internetz! Try this below for downloading a GeoJson file...it might take a few minutes...
```{r, cache=TRUE}
EW <- geojsonio::geojson_read("https://opendata.arcgis.com/datasets/8edafbe3276d4b56aec60991cbddda50_4.geojson", what = "sp")
```
Or you can do a manual download from [here](http://geoportal.statistics.gov.uk/datasets/8edafbe3276d4b56aec60991cbddda50_2) --- see point 7.
Pull out London using grep and the regex wildcard for 'start of the string' (^) to to look for the bit of the district code that relates to London (E09) from the 'lad15cd' column in the data slot of our spatial polygons dataframe
```{r, cache=TRUE}
LondonMap <- EW[grep("^E09",EW@data$lad15cd),]
#plot it using the base plot function
qtm(LondonMap)
```
7. Of course, we can also read in our data from a shapefile stored in a local directory:
```{r, cache=TRUE}
#read the shapefile into a simple features object
BoroughMapSF <- st_read("prac1_data/statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp")
BoroughMapSP <- LondonMap
#plot it very quickly usking qtm (quick thematic map) to check
#it has been read in correctly
qtm(BoroughMapSF)
```
```{r, cache=TRUE}
qtm(BoroughMapSP)
```
8. And naturally we can convert between simple features objects and spatialPolygonsDataFrames very easily:
```{r, cache=TRUE}
library(methods)
#check the class of BoroughMapSF
class(BoroughMapSF)
```
```{r, cache=TRUE}
#And check the class of BoroughMapSP
class(BoroughMapSP)
```
```{r, cache=TRUE}
#now convert the SP object into an SF object...
newSF <- st_as_sf(BoroughMapSP)
#and try the other way around SF to SP...
newSP <- as(newSF, "Spatial")
#simples!
BoroughMapSP <- as(BoroughMapSF, "Spatial")
```
#### Attribute data
OK, enough messing around, show us the maps!!
9. Hold your horses, before be can create a map, we need to join some attribute data to some boundaries. Doing this on a ```SP``` object can be a bit of a pain, but I’ll show you here:
```{r, cache=TRUE, results="hide"}
#join the data to the @data slot in the SP data frame
BoroughMapSP@data <- data.frame(BoroughMapSP@data,LondonData[match(BoroughMapSP@data[,"GSS_CODE"],LondonData[,"New.code"]),])
#check it's joined.
head(BoroughMapSP@data)
```
10. Joining data is a bit more intuitive with ```merge()```:
```{r, warning=FALSE, cache=TRUE}
BoroughDataMap<-merge(BoroughMapSF,
LondonData,
by.x="GSS_CODE",
by.y="New.code",
no.dups = TRUE)
```
An alternative to ```merge()``` would be to use a ```left_join()``` (like in SQL)
```{r, cache=TRUE, warning=FALSE, message=FALSE}
BoroughDataMap2 <- BoroughMapSF %>% left_join(LondonData,
by = c("GSS_CODE" = "New.code"))
```
However, you would need to remove the duplicate City of London row afterwards
#### Making some maps
If you want to learn a bit more about the sorts of things you can do with tmap, then there are 2 vignettes that you can access [here](https://cran.r-project.org/web/packages/tmap/) --- I suggest you refer to these to see the various things you can do using tmap. Here’s a quick sample though:
11. We can create a choropleth map very quickly now using ```qtm()```
```{r, cache=TRUE}
library(tmap)
library(tmaptools)
tmap_mode("plot")
```
```{r, cache=TRUE}
qtm(BoroughDataMap,
fill = "Rate.of.JobSeekers.Allowance..JSA..Claimants...2015")
```
You can also add a basemap and some other guff, if you wish...This part of the practical originally used the following code (do not run it):
```{r, eval=FALSE, cache=TRUE}
tmap_mode("plot")
st_transform(BoroughDataMap, 4326)
london_osm <- tmaptools::read_osm(BoroughDataMap, type = "esri", zoom = NULL)
qtm(BoroughDataMap) +
tm_shape(BoroughDataMap) +
tm_polygons("Rate.of.JobSeekers.Allowance..JSA..Claimants...2015",
style="jenks",
palette="YlOrBr",
midpoint=NA,
title="Rate per 1,000 people",
alpha = 0.5) +
tm_compass(position = c("left", "bottom"),type = "arrow") +
tm_scale_bar(position = c("left", "bottom")) +
tm_layout(title = "Job seekers' Allowance Claimants", legend.position = c("right", "bottom"))
```
However, there seems to be an issue with ```read_osm()``` that means even the example data won't work. I've logged the issue on GitHub and stackexchange and will most likely get some pretty direct comments. Have a look:
* https://github.com/mtennekes/tmaptools/issues/17
* https://stackoverflow.com/questions/57408279/parameter-error-in-tmaptools-package-read-osm-when-using-example-data/57432541#57432541
So in the mean time i've made a bit of a workaround that uses the packages ```ggmap``` and ```ggplot```... You can find this later on in this practical when we cover ```ggplot``` and extra map features in ([Maps with extra features]).
12. How about more than one map, perhaps using different data breaks...
```{r, cache=TRUE, warning=FALSE, message=FALSE}
tm_shape(BoroughDataMap) +
tm_polygons(c("Average.Public.Transport.Accessibility.score...2014", "Violence.against.the.person.rate...2014.15"),
style=c("jenks", "pretty"),
palette=list("YlOrBr", "Purples"),
auto.palette.mapping=FALSE,
title=c("Average Public Transport Accessibility", "Violence Against the Person Rate"))
```
You will notice that to choose the colour of the maps, I entered some codes. These are the names of colour ramps from the `RColourBrewer` package which comes bundled with ```tmap```. ```RColorBrewer``` uses colour palettes available from the colorbrewer2 [website](http://colorbrewer2.org/) which is in turn based on the [work of Cynthia Brewer and colleagues at Penn State University](http://www.personal.psu.edu/cab38/ColorBrewer/ColorBrewer_updates.html). Cynthia brewer has carried out large amount of academic research into determining the best colour palettes for GIS applications and so we will defer to her expertise here.
If you want to look at the range of colour palettes available, as we; as going to the ColorBrewer website, you can use the a little shiny app which comes bundled with ```tmaptools```
```{r, eval=FALSE, cache=TRUE}
#You might need to install the shinyjs package for this to work
install.packages("shinyjs")
```
```{r, eval=FALSE, cache=TRUE}
library(shinyjs)
#it's possible to explicitly tell R which
#package to get the function from with the :: operator...
tmaptools::palette_explorer()
```
13. ```tmap``` will even let you make a FRICKING INTERACTIVE MAP!!! Oh yes, we can do interactive maps…!
```{r, cache=TRUE, warning=FALSE, message=FALSE}
tmap_mode("view")
```
```{r, cache=TRUE, warning=FALSE, message=FALSE}
tm_shape(BoroughDataMap) +
tm_polygons("X..children.in.year.6.who.are.obese..2011.12.to.2013.14",
style="cont",
palette="PuRd",
midpoint=NA,
title="Truffle Shuffle Intensity")+
tmap_options(max.categories = 5)
```
```{r, cache=TRUE, eval=FALSE}
####You can even save your map as an html file
tmap_save(filename = "truffle.html")
```
#### Have a play around…
There are loads of options for creating maps with ```tmap``` --- read the vignettes that have been provided by the developers of the package and see if you can adapt the maps you have just made --- or even make some alternative maps using built in data.
* https://cran.r-project.org/web/packages/tmap/vignettes/tmap-nutshell.html * https://cran.r-project.org/web/packages/tmap/vignettes/tmap-modes.html
You should also read the reference manual on the [package homepage](https://cran.r-project.org/web/packages/tmap/)
In fact, since I wrote this the ```tmap``` package has developed quite a bit more --- have a look at some of the cool examples [here](https://github.com/mtennekes/tmap)
Have a play and see what cool shiz you can create!
This is an example from the BubbleMap folder on the ```tmap``` GitHub. Don't worry about what GitHub is we will cover that soon.
```{r message=FALSE, warning=FALSE, cache=TRUE}
# load spatial data included in the tmap package
data("World", "metro")
# calculate annual growth rate
metro$growth <- (metro$pop2020 - metro$pop2010) / (metro$pop2010 * 10) * 100
# plot
tm_shape(World) +
tm_polygons("income_grp", palette = "-Blues",
title = "Income class", contrast = 0.7, border.col = "gray30", id = "name") +
tm_text("iso_a3", size = "AREA", col = "gray30", root=3) +
tm_shape(metro) +
tm_bubbles("pop2010", col = "growth", border.col = "black",
border.alpha = 0.5,
breaks = c(-Inf, 0, 2, 4, 6, Inf) ,
palette = "-RdYlGn",
title.size = "Metro population (2010)",
title.col = "Annual growth rate (%)",
id = "name",
popup.vars=c("pop2010", "pop2020", "growth")) +
tm_format("World") +
tm_style("gray")
```
## Making maps using ggplot2
So as you have seen, it is possible to make very nice thematic maps with ```tmap```. However, there are other options. The ```ggplot2``` package is a very powerful graphics package that allows us to a huge range of sophisticated plots, including maps.
The latest development version of ```ggplot2``` has support for simple features objects with the new `geom_sf` class (http://ggplot2.tidyverse.org/reference/ggsf.html), which, quite frankly, is bloody brilliant!
14. If you have not already done so, install and library the ```ggplot2``` and ```rgeos``` packages (they should be installed automatically as part of ```tidyverse``` and ```tmap``` packages, but occasionally they need to be installed separately).
15. Now there are two main ways in which you can construct a plot in ```ggplot2 ```: ```qplot() ``` and ```ggplot() ```. ```qplot ``` is short for ‘Quick plot’ and can be good for producing quick charts and maps, but is perhaps less good for constructing complex layered plots. ```ggplot() ``` is better for building up a plot layer by layer, e.g. ```ggplot()+layer1+layer2 ```, and so this is what we will use here.
16. The important elements of any ```ggplot ``` layer are the aesthetic mappings –-- aes(x,y, …) –-- which tell ggplot where to place the plot objects. We can imagine a map just like a graph with all features mapping to an x and y axis. All geometry ( ```geom_ ```) types in ggplot feature some kind of aesthetic mapping and these can either be declared at the plot level, e.g. (don't run this)
```{r, eval=FALSE, cache=TRUE}
ggplot(data.frame,
aes(x=x, y=y))
```
or, more flexibly at the level of the individual ```geom_layer()```, e.g.
```{r, eval=FALSE, cache=TRUE}
geom_polygon(aes(x=x, y=y),
data.frame)
```
17. To begin our plot, we will start with the map layer --– we will generate this using the ```geom_sf()``` function in ```ggplot2```:
```{r, cache=TRUE}
library(ggplot2)
ggplot()+geom_sf(mapping = aes(geometry=geometry),
data = BoroughDataMap)+
theme_minimal()
```
18. To colour your map, then just pass the name of the variable you want to map to the fill parameter in the aesthetics:
```{r, cache=TRUE}
ggplot()+geom_sf(mapping = aes(geometry=geometry,
fill=Median.Household.income.estimate..2012.13.),
data = BoroughDataMap)+
theme_minimal()
```
19. As you can see, this map looks OK, but there are a few issues with things like the colour ramp and a lack of appropriate labels. We can correct this by adding a few more layers. Firstly we can change the palette:
```{r, cache=TRUE}
palette1<-scale_fill_continuous(low="white",
high="orange",
"Value (£)")
```
20. And some appropriate labels:
```{r, cache=TRUE}
labels<-labs(title="Median household income estimate 2012 to 2013",
x="Longitude",
y="Latitude")
```
21. Before plotting the all of them together:
```{r, cache=TRUE, eval=TRUE}
ggplot()+
geom_sf(mapping = aes(geometry=geometry,
fill = Median.Household.income.estimate..2012.13.)
,data = BoroughDataMap)+
theme_minimal()+
palette1+
labels
```
Check out [this](https://www.r-spatial.org/r/2018/10/25/ggplot2-sf.html) resource for more options in ```ggplot```
### Changing projections
Now until now, we’ve not really considered how our maps have been printed to the screen. The coordinates stored in the ```geometry``` column of your ```sf``` object contain the information to enable points, lines or polygons to be drawn on the screen. The first ```ggplot``` map above could fool you into thinking the coordinate system we are using is latitude and longitude, but actually, the map coordinates are stored in British National Grid.
How can we tell?
You can check that the coordinate reference systems of ```sf ``` or ```sp ``` objects using the print function:
```{r, cache=TRUE, echo=FALSE}
print(BoroughMapSP)
```
```{r, cache=TRUE, echo=FALSE}
print(BoroughMapSF)
```
#### Proj4
If you’re a spatial geek and you’re used to looking at London, then a quick glance at the values of the extent / bounding box (bbox) will tell you that you are working in British National Grid as the x and y values are in 6 Figures, with x values around 52000 to 55000 and y values around 15000 to 20000. The other way of telling is by looking at the coordinate reference system (CRS) value --- in the files above it’s defined by the bit that says:
```+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy```
“Well that’s clear as mud!” I hear you cry! Yes, not obvious is it! This is called a proj-string or proj4-string and its the proj4-string for British National Grid. You can learn what each of the bits of this mean [here](https://proj4.org/usage/quickstart.html)
The proj4-string basically tells the computer where on the earth to locate the coordinates that make up the geometries in your file and what distortions to apply (i.e. if to flatten it out completely etc.)
24. Sometimes you can download data from the web and it doesn’t have a CRS . If any boundary data you download does not have a coordinate reference system attached to it (NA is displayed in the coord. ref section), this is not a huge problem --- it can be added afterwards by adding the proj4string to the file.
To find the proj4-strings for a whole range of different geographic projections, use the search facility at http://spatialreference.org/ or http://epsg.io/.
#### EPSG
Now, if you can store a whole proj4-string in your mind, you must be some kind of savant (why are you doing this course? you could make your fortune as a card-counting poker player or something!). The rest of us need something a little bit more easy to remember and for coordinate reference systems, the saviour is the European Petroleum Survey Group (EPSG) --- (naturally!). Now managed and maintained by the [International Association of Oil and Gas producers](http://www.epsg.org/) --- EPSG codes are short numbers represent all coordinate reference systems in the world and link directly to proj4 strings.
The EPSG code for British National Grid is 27700 --- http://epsg.io/27700. The EPSG code for the WGS84 World Geodetic System (usually the default CRS for most spatial data) is 4326 --- http://epsg.io/4326
26. If your boundary data doesn’t have a spatial reference system, you can read it in you can read it in and set the projection either with the full proj4 string, or, more easily, with the EPSG code:
```{r, cache=TRUE, warning=FALSE, message=FALSE}
# read borough map in and explicitly set projection to British National Grid
# using the EPSG string code 27700
BoroughMapSF <- st_read("prac1_data/statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp")
BoroughMapSF <- st_set_crs(BoroughMapSF, 27700)
```
```{r, cache=TRUE, warning=FALSE, message=FALSE}
#or more concisely
BoroughMapSF <- st_read("prac1_data/statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp") %>% st_set_crs(27700)
```
27. Another option is to use the function ```readOGR()``` from the ```rgdal``` package:
```{r, message=FALSE, warning=FALSE, cache=TRUE}
BoroughMapSP <- readOGR("prac1_data/statistical-gis-boundaries-london/ESRI/London_Borough_Excluding_MHW.shp")
#create a variable for the EPSG code to reference the proj4string
#(EPSG codes are shorter and easier to remember than the full strings!)
#and store it in a variable...
UKBNG <- "+init=epsg:27700"
#now set the proj4string for your BoroughMap object
#note, this will probably throw an error if your dataset already has a
#CRS, this is just for demonstration...
proj4string(BoroughMapSP) <- CRS(UKBNG)
```
```{r, cache=TRUE}
print(BoroughMapSP) # check for new CRS
```
#### Reprojecting your spatial data
Reprojecting your data is something that you might have to (or want to) do, on occasion. Why? Well, one example might be if you want to measure the distance of a line object, or the distance between two polygons. This can be done far more easily in a projected coordinate system like British National Grid (where the units are measured in metres) than it can a geographic coordinate system such as WGS84 (where the units are degrees).
For generating maps in packages like ```leaflet```, your maps will also need to be in WGS84, rather than British National Grid.
28. So once your data has a coordinates system to work with, we can re-project or transform to anything we like. The most commonly used is the global latitude and longitude system (WGS84). With SP objects, this is carried out using the ```spTransform() ``` function:
```{r, cache=TRUE}
BoroughMapSPWGS84 <-spTransform(BoroughMapSP, CRS("+proj=longlat +datum=WGS84"))
print(BoroughMapSPWGS84)
```
```{r, cache=TRUE}
#transform it back again:
BoroughMapSPBNG <-spTransform(BoroughMapSP, CRS(UKBNG))
print(BoroughMapSPBNG)
```
```{r, cache=TRUE}
#You may want to create a similar variable for WGS84
latlong <- "+init=epsg:4326"
```
And for SF objects it’s carried out using ```st_transform```:
```{r, cache=TRUE}
BoroughMapSFWGS84 <- st_transform(BoroughMapSF, 4326)
print(BoroughMapSFWGS84)
```
In the SF object, you can compare the values in the geometry column with those in the original file to look at how they have changed…
### Maps with extra features
Now we can re-project our data, it frees us up to bring in, for example, different base maps and other stuff...
How about adding a basemap to our map...this follows on from the earlier section (when i said ```read_osm()``` wasn't working) and builds on what we just learned about mapping with ```ggplot```
```{r message=FALSE, cache=TRUE, eval=FALSE}
install.packages(c("ggmap", "BAMMtools"))
```
```{r, message=FALSE, cache=TRUE}
library(ggmap)
library(BAMMtools)
# put into WGS84 to get basemap data
BoroughDataMapWGS84<-st_transform(BoroughDataMap, 4326)
# bounding box of London
londonbbox2 <- as.vector(st_bbox(BoroughDataMapWGS84))
# use bounding box to get our basemap
b <- get_map(londonbbox2,
maptype="roadmap",
source="osm",
zoom=10)
plot(b)
# try changing the maptype to watercolor
b <- get_stamenmap(londonbbox2,
zoom = 10,
source="osm",
maptype = "toner-lite")
plot(b)
# work out the jenks of our data, k means numnber of divisons
jenks=getJenksBreaks(BoroughDataMapWGS84$Rate.of.JobSeekers.Allowance..JSA..Claimants...2015,
k=5)
# set the palette using colorbrewer
palette1<- scale_fill_distiller(type = "seq",
palette = "YlOrBr",
breaks=jenks,
guide="legend")
# any labels
labels<-labs(title="Rate per 1,000 people",
x="Longitude",
y="Latitude")