forked from orb16/Creating-maps-in-R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathintro-spatial.tex
1809 lines (1527 loc) · 79.7 KB
/
intro-spatial.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[]{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
\usepackage[utf8]{inputenc}
\else % if luatex or xelatex
\ifxetex
\usepackage{mathspec}
\usepackage{xltxtra,xunicode}
\else
\usepackage{fontspec}
\fi
\defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
\newcommand{\euro}{€}
\fi
% use microtype if available
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
\usepackage[margin=1in]{geometry}
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\newenvironment{Shaded}{}{}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
\newcommand{\RegionMarkerTok}[1]{{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\NormalTok}[1]{{#1}}
\usepackage{graphicx}
% Redefine \includegraphics so that, unless explicit options are
% given, the image width will not exceed the width of the page.
% Images get their normal width if they fit onto the page, but
% are scaled down if they would overflow the margins.
\makeatletter
\def\ScaleIfNeeded{%
\ifdim\Gin@nat@width>\linewidth
\linewidth
\else
\Gin@nat@width
\fi
}
\makeatother
\let\Oldincludegraphics\includegraphics
{%
\catcode`\@=11\relax%
\gdef\includegraphics{\@ifnextchar[{\Oldincludegraphics}{\Oldincludegraphics[width=\ScaleIfNeeded]}}%
}%
\ifxetex
\usepackage[setpagesize=false, % page size defined by xetex
unicode=false, % unicode breaks when used with xetex
xetex]{hyperref}
\else
\usepackage[unicode=true]{hyperref}
\fi
\hypersetup{breaklinks=true,
bookmarks=true,
pdfauthor={Robin Lovelace ([email protected]) and James Cheshire ([email protected])},
pdftitle={Introduction to visualising spatial data in R},
colorlinks=true,
citecolor=blue,
urlcolor=blue,
linkcolor=magenta,
pdfborder={0 0 0}}
\urlstyle{same} % don't use monospace font for urls
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em} % prevent overfull lines
\setcounter{secnumdepth}{0}
\title{Introduction to visualising spatial data in R}
\author{Robin Lovelace
(\href{mailto:[email protected]}{[email protected]}) and James
Cheshire
(\href{mailto:[email protected]}{[email protected]})}
\date{July, 2014}
\begin{document}
\begin{center}
\huge Introduction to visualising spatial data in R \\[0.2cm]
\large \emph{Robin Lovelace
(\href{mailto:[email protected]}{[email protected]}) and James
Cheshire
(\href{mailto:[email protected]}{[email protected]})}\\[0.1cm]
\large \emph{July, 2014} \\
\normalsize
\end{center}
{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{2}
\tableofcontents
}
\newpage
\section{Part I: Introduction}\label{part-i-introduction}
This tutorial is an introduction to spatial data in R and map making
with R's `base' graphics and the popular graphics package
\textbf{ggplot2}. It assumes no prior knowledge of spatial data analysis
but prior understanding of the R command line would be beneficial. For
people new to R, we recommend working through an `Introduction to R'
type tutorial, such as ``A (very) short introduction to R''
(\href{http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf}{Torfs
and Brauer, 2012}) or the more geographically inclined ``Short
introduction to R''
(\href{http://www.social-statistics.org/wp-content/uploads/2012/12/intro_to_R1.pdf}{Harris,
2012}).
Building on such background material, the following set of exercises is
concerned with specific functions for spatial data and visualisation. It
is divided into five parts:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
Introduction, which provides a guide to R's syntax and preparing for
the tutorial
\item
Spatial data in R, which describes basic spatial functions in R
\item
Manipulating spatial data, which includes changing projection,
clipping and spatial joins
\item
Map making with \textbf{ggplot2}, a recent graphics package for
producing beautiful maps quickly
\item
Taking spatial analysis in R further, a compilation of resources for
furthering your skills
\end{itemize}
An up-to-date version of this tutorial is maintained at
\href{https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial-rl.pdf}{\url{https://github.com/Robinlovelace/Creating-maps-in-R}}.
The source files used to create this tutorial, including the input data
can be downloaded as a
\href{https://github.com/Robinlovelace/Creating-maps-in-R/archive/master.zip}{zip
file}, as described below. The entire tutorial was written in
\href{http://rmarkdown.rstudio.com/}{RMarkdown}, which allows R code to
run as the document compiles, ensuring reproducibility.
Any suggested improvements or new
\href{https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/vignettes}{vignettes}
are welcome, via email to Robin or by
\href{https://help.github.com/articles/fork-a-repo}{forking} the
\href{https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial.Rmd}{master
version} of this document.
\subsection{Typographic conventions and getting
help}\label{typographic-conventions-and-getting-help}
The colourful syntax highlighting in this document is thanks to
\href{http://rmarkdown.rstudio.com/}{RMarkdown}. We try to follow best
practice in terms of style, roughly following Google's style guide, an
in-depth guide written by
\href{http://cran.r-project.org/web/packages/rockchalk/vignettes/Rstyle.pdf}{Johnson
(2013)} and a \href{http://adv-r.had.co.nz/Style.html}{chapter} from
\href{http://adv-r.had.co.nz/}{\emph{Advanced R}} (Wickham, in press).
It is a good idea to get into the habit of consistent and clear writing
in any language, and R is no exception. Adding comments to your code is
also good practice, so you remember at a later date what you've done,
aiding the learning process. There are two main ways of commenting code
using the \texttt{\#} symbol: above a line of code or directly following
it, as illustrated in the block of code presented below, which should
create figure 1 if typed correctly into the R command line.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Generate data}
\NormalTok{x <-}\StringTok{ }\DecValTok{1}\NormalTok{:}\DecValTok{400}
\NormalTok{y <-}\StringTok{ }\KeywordTok{sin}\NormalTok{(x /}\StringTok{ }\DecValTok{10}\NormalTok{) *}\StringTok{ }\KeywordTok{exp}\NormalTok{(x *}\StringTok{ }\NormalTok{-}\FloatTok{0.01}\NormalTok{)}
\KeywordTok{plot}\NormalTok{(x, y) }\CommentTok{# plot the result}
\end{Highlighting}
\end{Shaded}
\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-2.pdf}
\caption{Basic plot of x and y}
\end{figure}
In the above code we first created a new \emph{object} that we have
called \texttt{x}. Any name could have been used, like
\texttt{x\_bumkin}, but \texttt{x} is concise and works just fine here.
It is good practice to give your objects meaningful names. Note
\texttt{\textless{}-}, the directional ``arrow'' assignment symbol. This
creates new objects. We will be using this symbol a lot in the
tutorial.\footnote{Tip: typing \texttt{Alt -} on the keyboard will
create it in RStudio. The equals sign \texttt{=} also works but is not
recommended by R developers.}
To distinguish between prose and code, please be aware of the following
typographic conventions: R code (e.g. \texttt{plot(x, y)}) is written in
a \texttt{monospace} font and package names (e.g. \textbf{rgdal}) are
written in \textbf{bold}. Blocks of code such as:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{)^}\DecValTok{2}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 1 4 9 25
\end{verbatim}
are compiled in-line: the \texttt{\#\#} indicates output from R.
Sometimes output from code in this tutorial is reduced to save space, so
do not be alarmed if R produces unexpected \texttt{warning} messages. In
a few cases we omit images to save space. This will be clear from the
comments. Images in this document are small and low-quality enable
portability of the pdf. They should display better on your computer
screen and can be saved at any resolution.
The code presented here is not the only way to do things: we encourage
you to play to gain a deeper understanding of R. Do not worry, you
cannot `break' anything using R and all the input data can be re-loaded
if things do go wrong. As with learning to skate, you learn by falling
and, getting an \texttt{Error:} message in R is much less painful than
landing on ones face on concrete! We encourage \texttt{Error:}s - it
means you are trying new things.
If you require help on any function, use the \texttt{help} function,
e.g. \texttt{help(plot)}. Because R users love being concise, this can
also be written as \texttt{?plot}. Feel free to use it at any point
you'd like more detail on a specific function (although R's help files
are famously cryptic for the un-initiated). Help on more general terms
can be found using the \texttt{??} symbol. To test this, try typing
\texttt{??regression}. For the most part, \emph{learning by doing} is a
good motto, so let's crack on and download some packages and then some
data.
\subsection{Prerequisites and
packages}\label{prerequisites-and-packages}
For this tutorial you need to install R, if you haven't already done so,
the latest version of which can be downloaded from
\href{http://cran.r-project.org/}{\url{http://cran.r-project.org/}}. A
number of R editors such as \href{http://www.rstudio.com/}{RStudio} can
be used to make R more user friendly, but these are not needed to
complete the tutorial.
R has a huge and growing number of spatial data packages. We recommend
taking a quick browse on R's main website:
\href{http://cran.r-project.org/web/views/Spatial.html}{\url{http://cran.r-project.org/web/views/Spatial.html}}.
The packages we will be using are \textbf{ggplot2}, \textbf{rgdal},
\textbf{rgeos}, \textbf{maptools}, \textbf{mapproj} and \textbf{ggmap}.
To test whether a package is installed, \textbf{ggplot2} for example,
enter \texttt{library(ggplot2)}. If you get an error message, it needs
to be installed: \texttt{install.packages("ggplot2")}. These will be
downloaded from CRAN (the Comprehensive R Archive Network); if you are
prompted to select a `mirror', select one that is close to your home. If
there is no output from R, this is good news: it means that the library
has already been installed on your computer. Install these packages now.
\section{Part II: Spatial data in R}\label{part-ii-spatial-data-in-r}
\subsection{Starting the tutorial}\label{starting-the-tutorial}
Now that we have taken a look at R's syntax and installed the necessary
packages, we can start looking at some real spatial data. This second
part introduces some spatial files that we will download from the
internet. Plotting and interrogating spatial objects are central spatial
data analysis in R, so we will focus on these elements in the next two
parts of the tutorial, before focussing on creating attractive maps in
Part IV.
\subsection{Downloading the data}\label{downloading-the-data}
Download the data for this tutorial now from :
\href{https://github.com/Robinlovelace/Creating-maps-in-R}{\url{https://github.com/Robinlovelace/Creating-maps-in-R}}.
Click on the ``Download ZIP'' button on the right hand side and once it
is downloaded unzip this to a new folder on your PC. Use the
\texttt{setwd} command to set the working directory to the folder where
the data is saved. If your username is ``username'' and you saved the
files into a folder called ``Creating-maps-in-R-master'' on your
Desktop, for example, you would type the following:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{setwd}\NormalTok{(}\StringTok{"C:/Users/username/Desktop/Creating-maps-in-R-master/"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
If you are working in RStudio, you can create a project that will
automatically set your working directory. To do this click ``Session''
from the top toolbar and select ``Set working directory \textgreater{}
choose directory''.
It is also worth taking a look at the input data in your file browser
before opening them in R, to get a feel for them. You could try opening
the file ``london\_sport.shp'', within the ``data'' folder of the
project, in a GIS program such as QGIS (which can be freely downloaded
from the internet), for example, to get a feel for it before loading it
into R. Also note that .shp files are composed of several files for each
object: you should be able to open ``london\_sport.dbf'' in a
spreadsheet program such as LibreOffice Calc. Once you've understood
something of this input data and where it lives, it's time to open it in
R.
\subsection{Loading the spatial data}\label{loading-the-spatial-data}
One of the most important steps in handling spatial data with R is the
ability to read in spatial data, such as
\href{http://en.wikipedia.org/wiki/Shapefile}{shapefiles} (a common
geographical file format). There are a number of ways to do this, the
most commonly used and versatile of which is \texttt{readOGR}. This
function, from the \textbf{rgdal} package, automatically extracts
information about the projection and the attributes of data.
\textbf{rgdal} is R's interface to the ``Geospatial Abstraction Library
(GDAL)'' which is used by other open source GIS packages such as QGIS
and enables R to handle a broader range of spatial data formats. If
you've not already \emph{installed} and loaded the rgdal package (as
described above for ggplot2) do so now:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal)}
\NormalTok{lnd_sport <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\StringTok{"london_sport"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## OGR data source with driver: ESRI Shapefile
## Source: "data", layer: "london_sport"
## with 33 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
\end{verbatim}
In the code above \texttt{dsn} stands for ``data source name'' and is an
\emph{argument} of the \emph{function} \texttt{readOGR}. Note that each
new argument is separated by a comma. The \texttt{dsn} argument in this
case is a \emph{character string} (indicated by quote marks like this
one \texttt{"}) that specifies the directory where the data files are
stored. R functions have a default order of arguments, so \texttt{dsn =}
does not actually need to be typed for the command to run. If the data
were stored in the current working directory, for example, one could use
\texttt{readOGR(".", "london\_sport")}. For clarity, it is good practice
to include argument names, such as \texttt{dsn} when learning new
functions and we continue this tradition below.
The next argument is another \emph{character string}: simply the name of
the file required. There is no need to add a file extension (e.g.
\texttt{.shp}) in this case.
The files beginning \texttt{london\_sport} in the \texttt{data/}
\href{https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/data}{directory}
contain the 2001 borough population and percentage participating in
sporting activities from the
\href{http://data.london.gov.uk/datastore/package/active-people-survey-kpi-data-borough}{active
people survey}. The boundary data is from the
\href{http://www.ordnancesurvey.co.uk/oswebsite/opendata/}{Ordnance
Survey}.
For information about how to load different types of spatial data, the
help documentation for \texttt{readOGR} is a good place to start. This
can be accessed from within R by typing \texttt{?readOGR}. For another
worked example, in which a GPS trace is loaded, please see Cheshire and
Lovelace (2014).
\subsection{Basic plotting}\label{basic-plotting}
We have now created a new spatial object called ``sport'' from the
``london\_sport'' shapefile. Spatial objects are made up of a number of
different \emph{slots}, mainly the attribute \emph{slot} and the
geometry \emph{slot}. The attribute \emph{slot} can be thought of as an
attribute table and the geometry \emph{slot} is where the spatial object
(and it's attributes) lie in space. Lets now analyse the sport object
with some basic commands:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{head}\NormalTok{(lnd_sport@data, }\DataTypeTok{n =} \DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## ons_label name Partic_Per Pop_2001
## 0 00AF Bromley 21.7 295535
## 1 00BD Richmond upon Thames 26.6 172330
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{mean}\NormalTok{(lnd_sport$Partic_Per)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 20.05
\end{verbatim}
Take a look at this output and notice the table format of the data and
the column names. There are two important symbols at work in the above
block of code: the \texttt{@} symbol in the first line of code is used
to refer to the attribute \emph{slot} of the object. The \texttt{\$}
symbol refers to a specific attribute (a variable with a column name) in
the \texttt{data} \emph{slot}, which was identified from the result of
running the first line of code. If you are using RStudio, test out the
auto-completion functionality by hitting \texttt{tab} before completing
the command - this can save you a lot of time in the long run.
The \texttt{head} function in the first line of the code above simply
means ``show the first few lines of data'', i.e.~the head. It's default
is to output the first 6 rows of the dataset (try simply
\texttt{head(lnd\_sport@data)}), but we can specify the number of lines
with \texttt{n = 2} after the comma. The second line of the code above
calculates the mean value of the variable \texttt{Partic\_Per} (sports
participation per 100 people) for each of the zones in the
\texttt{lnd\_sport} object. To explore \texttt{lnd\_sport} object
further, try typing \texttt{nrow(lnd\_sport)} and record how many zones
the dataset contains. You can also try \texttt{ncol(lnd\_sport)}.
Now we have seen something of the attribute \emph{slot} of the spatial
object, let us look at its \emph{geometry}, which describes where the
polygons are located in space:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd_sport) }\CommentTok{# not shown in tutorial - try it on your computer}
\end{Highlighting}
\end{Shaded}
\texttt{plot} is one of the most useful functions in R, as it changes
its behaviour depending on the input data (this is called
\emph{polymorphism} by computer scientists). Inputting another object
such as \texttt{plot(lnd\_sport@data)} will generate an entirely
different type of plot. Thus R is intelligent at guessing what you want
to do with the data you provide it with.
R has powerful subsetting capabilities that can be accessed very
concisely using square brackets, as shown in the following example:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{#select rows from attribute slot of lnd_sport object, where sports participation is less than 15.}
\NormalTok{lnd_sport@data[lnd_sport$Partic_Per <}\StringTok{ }\DecValTok{15}\NormalTok{, ]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## ons_label name Partic_Per Pop_2001
## 17 00AQ Harrow 14.8 206822
## 21 00BB Newham 13.1 243884
## 32 00AA City of London 9.1 7181
\end{verbatim}
The above line of code asked R to select rows from the
\texttt{lnd\_sport} object, where sports participation is lower than 15,
in this case rows 17, 21 and 32, which are Harrow, Newham and the city
centre respectively. The square brackets work as follows: anything
before the comma refers to the rows that will be selected, anything
after the comma refers to the number of columns that should be returned.
For example if the data frame had 1000 columns and you were only
interested in the first two columns you could specify \texttt{1:2} after
the comma. The ``:'' symbol simply means ``to'', i.e.~columns 1 to 2.
Try experimenting with the square brackets notation (e.g.~guess the
result of \texttt{lnd\_sport@data{[}1:2, 1:3{]}} and test it): it will
be useful.
So far we have been interrogating only the attribute \emph{slot}
(\texttt{@data}) of the \texttt{lnd\_sport} object, but the square
brackets can also be used to subset spatial objects, i.e.~the geometry
\emph{slot}. Using the same logic as before try to plot a subset of
zones with high sports participation.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{#plot zones from sports object where sports participation is greater than 25.}
\KeywordTok{plot}\NormalTok{(lnd_sport[lnd_sport$Partic_Per >}\StringTok{ }\DecValTok{25}\NormalTok{, ]) }\CommentTok{# output not shown in tutorial}
\end{Highlighting}
\end{Shaded}
This is useful, but it would be great to see these sporty areas in
context. To do this, simply use the \texttt{add = TRUE} argument after
the initial plot. (\texttt{add = T} would also work, but we like to
spell things out in this tutorial for clarity). What does the
\texttt{col} argument refer to in the below block - it should be obvious
(see figure 2).
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd_sport)}
\KeywordTok{plot}\NormalTok{(lnd_sport[lnd_sport$Partic_Per >}\StringTok{ }\DecValTok{25}\NormalTok{,], }\DataTypeTok{col =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{add =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-10.pdf}
\caption{Preliminary plot of London with areas of high sports
participation highlighted in blue}
\end{figure}
Congratulations! You have just interrogated and visualised a spatial
object: where are areas with high levels of sports participation in
London? The map tells us. Do not worry for now about the intricacies of
how this was achieved: you have learned vital basics of how R works as a
language; we will cover this in more detail in subsequent sections.
While we are on the topic of loading data, it is worth pointing out that
R can save and load data efficiently into its own data format
(\texttt{.RData}). Try \texttt{save(lnd\_sport, file = "sport.RData")}
and see what happens. If you type \texttt{rm(lnd\_sport)} (which removes
the object) and then \texttt{load("sport.RData")} you should see how
this works. \texttt{lnd\_sport} will disappear from the workspace and
then reappear.
\subsection{Attribute data}\label{attribute-data}
All shapefiles have both attribute table and geometry data. These are
automatically loaded with \texttt{readOGR}. The loaded attribute data
can be treated the same as an R
\href{http://www.statmethods.net/input/datatypes.html}{data frame}.
R deliberately hides the geometry of spatial data unless you print the
entire object (try typing \texttt{print(lnd\_sport)}). Let's take a look
at the headings of sport, using the following command:
\texttt{names(lnd\_sport)} Remember, the attribute data contained in
spatial objects are kept in a `slot' that can be accessed using the
\texttt{@} symbol: \texttt{lnd\_sport@data}. This is useful if you do
not wish to work with the spatial components of the data at all times.
Type \texttt{summary(lnd\_sport)} to get some additional information
about the data object. Spatial objects in R contain much additional
information:
\begin{verbatim}
summary(lnd_sport)
## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 503571.2 561941.1
## y 155850.8 200932.5
## Is projected: TRUE
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 ....]
\end{verbatim}
The above output tells us that \texttt{lnd\_sport} is a special spatial
class, in this case a \texttt{SpatialPolygonsDataFrame}, meaning it is
composed of various polygons, each of which has attributes. This is the
typical class of data found in administrative zones. The coordinates
tell us what the maximum and minimum x and y values are, for plotting.
Finally, we are told something of the coordinate reference system with
the \texttt{Is projected} and \texttt{proj4string} lines. In this case,
we have a projected system, which means it is a Cartesian reference
system, relative to some point on the surface of the Earth. We will
cover reprojecting data in the next part of the tutorial.
\section{Part III: Manipulating spatial
data}\label{part-iii-manipulating-spatial-data}
It is all very well being able to load and interrogate spatial data in
R, but to compete with modern GIS packages, R must also be able to
modify these spatial objects (see
`\href{https://github.com/Pakillo/R-GIS-tutorial}{using R as a GIS}'). R
has a wide range of very powerful functions for this, many of which
reside in additional packages alluded to in the introduction.
This course is introductory so only commonly required data manipulation
tasks, \emph{reprojecting} and \emph{joining/clipping} are covered here.
We will look at joining non-spatial data to our spatial object. We will
then cover spatial joins, whereby data is joined to other dataset based
on spatial location.
\subsection{Changing projection}\label{changing-projection}
Before undertaking spatial queries of an object, it is useful to know
the \emph{coordinate reference system} (CRS) it uses. You may have
noticed the word \texttt{proj4string} in the summary of the
\texttt{lnd\_sport} object above. This represents it CRS mathematically.
In some spatial data files, no CRS is specified or worse, and incorrect
CRS value is given. Provided the correct CRS is known, this can be
righted with a single line:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{proj4string}\NormalTok{(lnd_sport) <-}\StringTok{ }\KeywordTok{CRS}\NormalTok{(}\StringTok{"+init=epsg:27700"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
R issues a warning when changing the CRS in this way to ensure the user
knows that they are simply changing the CRS, not \emph{reprojecting} the
data. R uses \href{}{EPSG codes} to refer to different coordinate
reference systems. \texttt{27700} is the code for British National Grid.
A commonly used geographical (`lat/lon') CRS is `WGS84', whose EPSG code
is \texttt{4326}. The following code shows how to search the list of
available EPSG codes and create a new version of \texttt{lnd\_sport} in
WGS84:\footnote{Note: entering \texttt{projInfo()} will provide
additional CRS options available from \textbf{rgdal}.}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{EPSG <-}\StringTok{ }\KeywordTok{make_EPSG}\NormalTok{() }\CommentTok{# create data frame of available EPSG codes}
\NormalTok{EPSG[}\KeywordTok{grepl}\NormalTok{(}\StringTok{"WGS 84$"}\NormalTok{, EPSG$note), ] }\CommentTok{# search for WGS 84 code }
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## code note prj4
## 249 4326 # WGS 84 +proj=longlat +datum=WGS84 +no_defs
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_sport_wgs84 <-}\StringTok{ }\KeywordTok{spTransform}\NormalTok{(lnd_sport, }\KeywordTok{CRS}\NormalTok{(}\StringTok{"+init=epsg:4326"}\NormalTok{)) }\CommentTok{# reproject}
\end{Highlighting}
\end{Shaded}
The above code uses the function \texttt{spTransform}, from the
\textbf{sp} package, to convert the \texttt{lnd\_sport} object into a
new form, with the Coordinate Reference System (CRS) specified as WGS84.
The different epsg codes are a bit of hassle to remember but you can
search for them at
\href{http://spatialreference.org/}{spatialreference.org}.
\subsection{Attribute joins}\label{attribute-joins}
Attribute joins are used to link additional pieces of information to our
polygons. in the \texttt{lnd\_sport} object, for example, we have 5
attribute variables - that can be found by typing
\texttt{names(lnd\_sport)}. But what happens when we want to add an
additional variable from an external data table? We will use the example
of recorded crimes by borough to demonstrate this.
To reaffirm our starting point, let's re-load the ``london\_sport''
shapefile as a new object and plot it. This is identical to the
\texttt{lnd\_sport} object in the first instance, but we will give it a
new name, in case we ever need to re-use \texttt{lnd\_sport}. We will
call this new object \texttt{lnd}, short for London:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal) }\CommentTok{# ensure rgdal is loaded}
\CommentTok{# Create new object called "lnd" from "london_sport" shapefile}
\NormalTok{lnd <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\StringTok{"london_sport"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## OGR data source with driver: ESRI Shapefile
## Source: "data", layer: "london_sport"
## with 33 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd) }\CommentTok{# plot the lnd object}
\end{Highlighting}
\end{Shaded}
\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-14.pdf}
\caption{Plot of London}
\end{figure}
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{nrow}\NormalTok{(lnd) }\CommentTok{# return the number of rows}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] 33
\end{verbatim}
The non-spatial data we are going to join to the \texttt{lnd} object
contains records on crimes in London. This is stored in a
comma-delimited {[}(\texttt{.csv}){]} file called
``mps-recordedcrime-borough''. Viewing the
\href{https://raw.githubusercontent.com/Robinlovelace/Creating-maps-in-R/master/data/mps-recordedcrime-borough.csv}{file}
locally shows that each row representing a single reported crime. We are
going to use a function called \texttt{aggregate} to pre-process these
records, ready to join to our spatial \texttt{lnd} dataset. A new object
called \texttt{crime\_data} is created to store this data.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Create and look at new crime_data object}
\NormalTok{crime_data <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"data/mps-recordedcrime-borough.csv"}\NormalTok{,}
\DataTypeTok{fileEncoding =} \StringTok{"UCS-2LE"}\NormalTok{)}
\KeywordTok{head}\NormalTok{(crime_data) }\CommentTok{# display first 6 lines}
\KeywordTok{summary}\NormalTok{(crime_data$MajorText) }\CommentTok{# summary of crime type}
\CommentTok{# Extract "Theft & Handling" crimes and save}
\NormalTok{crime_theft <-}\StringTok{ }\NormalTok{crime_data[crime_data$MajorText ==}\StringTok{ "Theft & Handling"}\NormalTok{, ]}
\KeywordTok{head}\NormalTok{(crime_theft, }\DecValTok{2}\NormalTok{) }\CommentTok{# take a look at the result (replace 2 with 10 to see more rows)}
\CommentTok{# Calculate the sum of the crime count for each district and save result as a new object}
\NormalTok{crime_ag <-}\StringTok{ }\KeywordTok{aggregate}\NormalTok{(CrimeCount ~}\StringTok{ }\NormalTok{Spatial_DistrictName, }\DataTypeTok{FUN =} \NormalTok{sum,}
\DataTypeTok{data =} \NormalTok{crime_theft)}
\CommentTok{# Show the first two rows of the aggregated crime data}
\KeywordTok{head}\NormalTok{(crime_ag, }\DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}
There is a lot going on in the above block of code and you should not
expect to understand all of it upon first try: simply typing the
commands and thinking briefly about the outputs is all that is needed at
this stage to improve your intuitive understanding of R. It is worth
pointing out a few things that you may not have seen before that will
likely be useful in the future:
\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
In the first line of code the \texttt{fileEncoding} argument is used.
This is rarely necessary, but in this case the file comes in a strange
file format. 9 times out of ten you can omit this argument but it's
worth knowing about.
\item
The \texttt{which} function is used to select only those observations
that meet a specific condition, in this case all crimes involving
``Theft and Handling''.
\item
The \texttt{\textasciitilde{}} symbol means ``by'': we aggregated the
\texttt{CrimeCount} variable by the district name.
\end{itemize}
Now that we have crime data at the borough level
(\texttt{Spatial\_DistrictName}), the challenge is to join it to the
\texttt{lnd} object. We will base our join on the
\texttt{Spatial\_DistrictName} variable from the \texttt{crime\_ag}
object and the \texttt{name} variable from the \texttt{lnd} object. It
is not always straight forward to join objects based on names as the
names do not always match. Let us see which names in the
\texttt{crime\_ag} object match the spatial data object, \texttt{lnd}:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Compare the name column in lnd to Spatial_DistrictName column in crime_ag to see which rows match.}
\NormalTok{lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [12] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [23] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Return rows which do not match}
\NormalTok{lnd$name[!lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName]}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] City of London
## 33 Levels: Barking and Dagenham Barnet Bexley Brent Bromley ... Westminster
\end{verbatim}
The first line of code above uses the \texttt{\%in\%} command to
identify which values in \texttt{lnd\$name} are also contained in the
names of the crime data. The results indicate that all but one of the
borough names matches. The second line of code tells us that it is City
of London, row 25, that is named differently in the crime data. Look at
the results (not shown here) on your computer.
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Discover the names of the names}
\KeywordTok{levels}\NormalTok{(crime_ag$Spatial_DistrictName)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] "Barking and Dagenham" "Barnet"
## [3] "Bexley" "Brent"
## [5] "Bromley" "Camden"
## [7] "Croydon" "Ealing"
## [9] "Enfield" "Greenwich"
## [11] "Hackney" "Hammersmith and Fulham"
## [13] "Haringey" "Harrow"
## [15] "Havering" "Hillingdon"
## [17] "Hounslow" "Islington"
## [19] "Kensington and Chelsea" "Kingston upon Thames"
## [21] "Lambeth" "Lewisham"
## [23] "Merton" "Newham"
## [25] "NULL" "Redbridge"
## [27] "Richmond upon Thames" "Southwark"
## [29] "Sutton" "Tower Hamlets"
## [31] "Waltham Forest" "Wandsworth"
## [33] "Westminster"
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Rename row 25 in crime_ag to match row 25 in lnd, as suggested results form above}
\KeywordTok{levels}\NormalTok{(crime_ag$Spatial_DistrictName)[}\DecValTok{25}\NormalTok{] <-}
\StringTok{ }\KeywordTok{as.character}\NormalTok{(lnd$name[!lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName])}
\NormalTok{lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName }\CommentTok{# now all columns match}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [29] TRUE TRUE TRUE TRUE TRUE
\end{verbatim}
The above code block first identified the row with the faulty name and
then renamed the level to match the \texttt{lnd} dataset. Note that we
could not rename the variable directly, as it is stored as a factor.
We are now ready to join the datasets. It is recommended to use the
\texttt{join} function in the \textbf{plyr} package but the
\texttt{merge} function could equally be used. Note that when we ask for
help for a function that is not loaded, nothing happens, indicating we
need to load it:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{help}\NormalTok{(join) }\CommentTok{# error flagged}
\KeywordTok{library}\NormalTok{(plyr)}
\KeywordTok{help}\NormalTok{(join) }\CommentTok{# should now be loaded}
\end{Highlighting}
\end{Shaded}
The documentation for join will be displayed if the plyr package is
loaded (if not, load or install and load it!). It requires all joining
variables to have the same name, so we will rename the variable to make
the join work:
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{head}\NormalTok{(lnd$name)}
\KeywordTok{head}\NormalTok{(crime_ag$Spatial_DistrictName) }\CommentTok{# the variables to join}
\NormalTok{crime_ag <-}\StringTok{ }\KeywordTok{rename}\NormalTok{(crime_ag, }\DataTypeTok{replace =} \KeywordTok{c}\NormalTok{(}\StringTok{"Spatial_DistrictName"} \NormalTok{=}\StringTok{ "name"}\NormalTok{))}
\KeywordTok{head}\NormalTok{(}\KeywordTok{join}\NormalTok{(lnd@data, crime_ag)) }\CommentTok{# test it works}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Joining by: name
\end{verbatim}
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd@data <-}\StringTok{ }\KeywordTok{join}\NormalTok{(lnd@data, crime_ag)}
\end{Highlighting}
\end{Shaded}
\begin{verbatim}
## Joining by: name
\end{verbatim}
Take a look at the \texttt{lnd@data} object. You should see new
variables added, meaning the attribute join was successful.
\subsection{Clipping and spatial
joins}\label{clipping-and-spatial-joins}
In addition to joining by zone name, it is also possible to do
\href{http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html\#//00080000000q000000}{spatial
joins} in R. There are three main varieties: many-to-one, where the
values of many intersecting objects contribute to a new variable in the
main table, one-to-many, or one-to-one. Because boroughs in London are
quite large, we will conduct a many-to-one spatial join. We will be
using transport infrastructure points such as tube stations and
roundabouts as the spatial data to join, with the aim of finding out
about how many are found in each London borough.
\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal)}
\CommentTok{#create new stations object using the "lnd-stns" shapefile.}
\NormalTok{stations <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\DataTypeTok{layer =} \StringTok{"lnd-stns"}\NormalTok{)}
\KeywordTok{proj4string}\NormalTok{(stations) }\CommentTok{# this is the full geographical detail.}
\KeywordTok{proj4string}\NormalTok{(lnd)}
\CommentTok{#return the bounding box of the stations object}
\KeywordTok{bbox}\NormalTok{(stations)}
\CommentTok{#return the bounding box of the lnd object}
\KeywordTok{bbox}\NormalTok{(lnd)}
\end{Highlighting}
\end{Shaded}
The above code loads the data correctly, but also shows that there are
problems with it: the Coordinate Reference System (CRS) of
\texttt{stations} differs from that of our \texttt{lnd} object. OSGB
1936 (or \href{http://spatialreference.org/ref/epsg/27700/}{EPSG 27700})
is the official CRS for the UK, so we will convert the object to this:
\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Create reprojected stations object}
\NormalTok{stations27700 <-}\StringTok{ }\KeywordTok{spTransform}\NormalTok{(stations, }\DataTypeTok{CRSobj =} \KeywordTok{CRS}\NormalTok{(}\KeywordTok{proj4string}\NormalTok{(lnd)))}
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations27700 }\CommentTok{# overwrite the stations object with stations27700}
\KeywordTok{rm}\NormalTok{(stations27700) }\CommentTok{# remove the stations27700 object to clear up}
\KeywordTok{plot}\NormalTok{(lnd) }\CommentTok{# plot London for context (see figure 4 below)}
\KeywordTok{points}\NormalTok{(stations) }\CommentTok{# overlay the station points}
\end{Highlighting}
\end{Shaded}
\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-22.pdf}
\caption{Sampling and plotting stations}
\end{figure}
Now we can clearly see that the \texttt{stations} points overlay the
boroughs. The problem is that the spatial extent of \texttt{stations} is
great than that of \texttt{lnd}. We will take a spatially determined
subset of the stations object that fall inside greater London. This is
\emph{clipping}.
Two functions can be used to clip \texttt{stations} so that only those
falling within London boroughs are retained: \texttt{sp::over}, and
\texttt{rgeos::gIntersects} (the word preceding the \texttt{::} symbol
refers to the package which the function is from). Use \texttt{?}
followed by the function to get help on each. Whether
\texttt{gIntersects} of \texttt{over} is needed depends on the spatial
data classes being compared (Bivand et al. 2013).
In this tutorial we will use the \texttt{over} function as it is easiest
to use. In fact, it can be called just by using square brackets:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations[lnd, ]}
\KeywordTok{plot}\NormalTok{(stations) }\CommentTok{# test the clip succeeded (see figure 5)}
\end{Highlighting}
\end{Shaded}
\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-23.pdf}
\caption{The clipped stations dataset}
\end{figure}
The above line of code says: ``output all \texttt{stations} within the
\texttt{lnd} object bounds''. This is an incredibly concise way of
clipping and has the added advantage of being consistent with R's syntax
for non-spatial clipping. To prove it worked, only stations within the
London boroughs appear in the plot.
\texttt{gIntersects} can achieve the same result, but with more lines of
code (see
\href{http://www.rpubs.com/RobinLovelace/11796}{www.rpubs.com/RobinLovelace}
for more on this) . It may seem confusing that two different functions
can be used to generate the same result. However, this is a common issue
in R; the question is finding the most appropriate solution.
In its less concise form (without use of square brackets), \texttt{over}
takes two main input arguments: the target layer (the layer to be
altered) and the source layer by which the target layer is to be
clipped. The output of \texttt{over} is a data frame of the same
dimensions as the original object (in this case \texttt{stations}),
except that the points which fall outside the zone of interest are set
to a value of \texttt{NA} (``no answer''). We can use this to make a
subset of the original polygons, remembering the square bracket notation
described in the first section. We create a new object, \texttt{sel}
(short for ``selection''), containing the indices of all relevant
polygons:
\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{sel <-}\StringTok{ }\KeywordTok{over}\NormalTok{(stations, lnd)}
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations[!}\KeywordTok{is.na}\NormalTok{(sel[,}\DecValTok{1}\NormalTok{]),]}
\end{Highlighting}
\end{Shaded}
Typing \texttt{summary(sel)} should provide insight into how this
worked: it is a dataframe with 1801 NA values, representing zones
outside of the London polygon. Note that the preceding two lines of code
is equivalent to the single line of code,
\texttt{stations \textless{}- stations{[}lnd, {]}}. The next section
demonstrates spatial aggregation, a more advanced version of spatial
subsetting.
\subsection{Spatial aggregation}\label{spatial-aggregation}
As with R's very terse code for spatial subsetting, the base function
\texttt{aggregate} (which provides summaries of variables based on some
grouping variable) also behaves differently when the inputs are spatial
objects.