intro-spatial.tex

\documentclass[]{article}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amssymb,amsmath}
\usepackage{ifxetex,ifluatex}
\usepackage{fixltx2e} % provides \textsubscript
% use upquote if available, for straight quotes in verbatim environments
\IfFileExists{upquote.sty}{\usepackage{upquote}}{}
\ifnum 0\ifxetex 1\fi\ifluatex 1\fi=0 % if pdftex
  \usepackage[utf8]{inputenc}
\else % if luatex or xelatex
  \ifxetex
    \usepackage{mathspec}
    \usepackage{xltxtra,xunicode}
  \else
    \usepackage{fontspec}
  \fi
  \defaultfontfeatures{Mapping=tex-text,Scale=MatchLowercase}
  \newcommand{\euro}{€}
\fi
% use microtype if available
\IfFileExists{microtype.sty}{\usepackage{microtype}}{}
\usepackage[margin=1in]{geometry}
\usepackage{color}
\usepackage{fancyvrb}
\newcommand{\VerbBar}{|}
\newcommand{\VERB}{\Verb[commandchars=\\\{\}]}
\DefineVerbatimEnvironment{Highlighting}{Verbatim}{commandchars=\\\{\}}
% Add ',fontsize=\small' for more characters per line
\newenvironment{Shaded}{}{}
\newcommand{\KeywordTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{\textbf{{#1}}}}
\newcommand{\DataTypeTok}[1]{\textcolor[rgb]{0.56,0.13,0.00}{{#1}}}
\newcommand{\DecValTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\BaseNTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\FloatTok}[1]{\textcolor[rgb]{0.25,0.63,0.44}{{#1}}}
\newcommand{\CharTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\StringTok}[1]{\textcolor[rgb]{0.25,0.44,0.63}{{#1}}}
\newcommand{\CommentTok}[1]{\textcolor[rgb]{0.38,0.63,0.69}{\textit{{#1}}}}
\newcommand{\OtherTok}[1]{\textcolor[rgb]{0.00,0.44,0.13}{{#1}}}
\newcommand{\AlertTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\FunctionTok}[1]{\textcolor[rgb]{0.02,0.16,0.49}{{#1}}}
\newcommand{\RegionMarkerTok}[1]{{#1}}
\newcommand{\ErrorTok}[1]{\textcolor[rgb]{1.00,0.00,0.00}{\textbf{{#1}}}}
\newcommand{\NormalTok}[1]{{#1}}
\usepackage{graphicx}
% Redefine \includegraphics so that, unless explicit options are
% given, the image width will not exceed the width of the page.
% Images get their normal width if they fit onto the page, but
% are scaled down if they would overflow the margins.
\makeatletter
\def\ScaleIfNeeded{%
  \ifdim\Gin@nat@width>\linewidth
    \linewidth
  \else
    \Gin@nat@width
  \fi
}
\makeatother
\let\Oldincludegraphics\includegraphics
{%
 \catcode`\@=11\relax%
 \gdef\includegraphics{\@ifnextchar[{\Oldincludegraphics}{\Oldincludegraphics[width=\ScaleIfNeeded]}}%
}%
\ifxetex
  \usepackage[setpagesize=false, % page size defined by xetex
              unicode=false, % unicode breaks when used with xetex
              xetex]{hyperref}
\else
  \usepackage[unicode=true]{hyperref}
\fi
\hypersetup{breaklinks=true,
            bookmarks=true,
            pdfauthor={Robin Lovelace (R.Lovelace@Leeds.ac.uk) and James Cheshire (james.cheshire@ucl.ac.uk)},
            pdftitle={Introduction to visualising spatial data in R},
            colorlinks=true,
            citecolor=blue,
            urlcolor=blue,
            linkcolor=magenta,
            pdfborder={0 0 0}}
\urlstyle{same}  % don't use monospace font for urls
\setlength{\parindent}{0pt}
\setlength{\parskip}{6pt plus 2pt minus 1pt}
\setlength{\emergencystretch}{3em}  % prevent overfull lines
\setcounter{secnumdepth}{0}

\title{Introduction to visualising spatial data in R}
\author{Robin Lovelace
(\href{mailto:R.Lovelace@Leeds.ac.uk}{R.Lovelace@Leeds.ac.uk}) and James
Cheshire
(\href{mailto:james.cheshire@ucl.ac.uk}{james.cheshire@ucl.ac.uk})}
\date{July, 2014}

\begin{document}

\begin{center}
\huge Introduction to visualising spatial data in R \\[0.2cm]
\large \emph{Robin Lovelace
(\href{mailto:R.Lovelace@Leeds.ac.uk}{R.Lovelace@Leeds.ac.uk}) and James
Cheshire
(\href{mailto:james.cheshire@ucl.ac.uk}{james.cheshire@ucl.ac.uk})}\\[0.1cm]
\large \emph{July, 2014} \\
\normalsize
\end{center}


{
\hypersetup{linkcolor=black}
\setcounter{tocdepth}{2}
\tableofcontents
}
\newpage

\section{Part I: Introduction}\label{part-i-introduction}

This tutorial is an introduction to spatial data in R and map making
with R's `base' graphics and the popular graphics package
\textbf{ggplot2}. It assumes no prior knowledge of spatial data analysis
but prior understanding of the R command line would be beneficial. For
people new to R, we recommend working through an `Introduction to R'
type tutorial, such as ``A (very) short introduction to R''
(\href{http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf}{Torfs
and Brauer, 2012}) or the more geographically inclined ``Short
introduction to R''
(\href{http://www.social-statistics.org/wp-content/uploads/2012/12/intro_to_R1.pdf}{Harris,
2012}).

Building on such background material, the following set of exercises is
concerned with specific functions for spatial data and visualisation. It
is divided into five parts:

\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  Introduction, which provides a guide to R's syntax and preparing for
  the tutorial
\item
  Spatial data in R, which describes basic spatial functions in R
\item
  Manipulating spatial data, which includes changing projection,
  clipping and spatial joins
\item
  Map making with \textbf{ggplot2}, a recent graphics package for
  producing beautiful maps quickly
\item
  Taking spatial analysis in R further, a compilation of resources for
  furthering your skills
\end{itemize}

An up-to-date version of this tutorial is maintained at
\href{https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial-rl.pdf}{\url{https://github.com/Robinlovelace/Creating-maps-in-R}}.
The source files used to create this tutorial, including the input data
can be downloaded as a
\href{https://github.com/Robinlovelace/Creating-maps-in-R/archive/master.zip}{zip
file}, as described below. The entire tutorial was written in
\href{http://rmarkdown.rstudio.com/}{RMarkdown}, which allows R code to
run as the document compiles, ensuring reproducibility.

Any suggested improvements or new
\href{https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/vignettes}{vignettes}
are welcome, via email to Robin or by
\href{https://help.github.com/articles/fork-a-repo}{forking} the
\href{https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial.Rmd}{master
version} of this document.

\subsection{Typographic conventions and getting
help}\label{typographic-conventions-and-getting-help}

The colourful syntax highlighting in this document is thanks to
\href{http://rmarkdown.rstudio.com/}{RMarkdown}. We try to follow best
practice in terms of style, roughly following Google's style guide, an
in-depth guide written by
\href{http://cran.r-project.org/web/packages/rockchalk/vignettes/Rstyle.pdf}{Johnson
(2013)} and a \href{http://adv-r.had.co.nz/Style.html}{chapter} from
\href{http://adv-r.had.co.nz/}{\emph{Advanced R}} (Wickham, in press).
It is a good idea to get into the habit of consistent and clear writing
in any language, and R is no exception. Adding comments to your code is
also good practice, so you remember at a later date what you've done,
aiding the learning process. There are two main ways of commenting code
using the \texttt{\#} symbol: above a line of code or directly following
it, as illustrated in the block of code presented below, which should
create figure 1 if typed correctly into the R command line.

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Generate data}
\NormalTok{x <-}\StringTok{ }\DecValTok{1}\NormalTok{:}\DecValTok{400}
\NormalTok{y <-}\StringTok{ }\KeywordTok{sin}\NormalTok{(x /}\StringTok{ }\DecValTok{10}\NormalTok{) *}\StringTok{ }\KeywordTok{exp}\NormalTok{(x *}\StringTok{ }\NormalTok{-}\FloatTok{0.01}\NormalTok{)}

\KeywordTok{plot}\NormalTok{(x, y) }\CommentTok{# plot the result}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-2.pdf}
\caption{Basic plot of x and y}
\end{figure}

In the above code we first created a new \emph{object} that we have
called \texttt{x}. Any name could have been used, like
\texttt{x\_bumkin}, but \texttt{x} is concise and works just fine here.
It is good practice to give your objects meaningful names. Note
\texttt{\textless{}-}, the directional ``arrow'' assignment symbol. This
creates new objects. We will be using this symbol a lot in the
tutorial.\footnote{Tip: typing \texttt{Alt -} on the keyboard will
  create it in RStudio. The equals sign \texttt{=} also works but is not
  recommended by R developers.}

To distinguish between prose and code, please be aware of the following
typographic conventions: R code (e.g. \texttt{plot(x, y)}) is written in
a \texttt{monospace} font and package names (e.g. \textbf{rgdal}) are
written in \textbf{bold}. Blocks of code such as:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{c}\NormalTok{(}\DecValTok{1}\NormalTok{:}\DecValTok{3}\NormalTok{, }\DecValTok{5}\NormalTok{)^}\DecValTok{2}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1]  1  4  9 25
\end{verbatim}

are compiled in-line: the \texttt{\#\#} indicates output from R.
Sometimes output from code in this tutorial is reduced to save space, so
do not be alarmed if R produces unexpected \texttt{warning} messages. In
a few cases we omit images to save space. This will be clear from the
comments. Images in this document are small and low-quality enable
portability of the pdf. They should display better on your computer
screen and can be saved at any resolution.

The code presented here is not the only way to do things: we encourage
you to play to gain a deeper understanding of R. Do not worry, you
cannot `break' anything using R and all the input data can be re-loaded
if things do go wrong. As with learning to skate, you learn by falling
and, getting an \texttt{Error:} message in R is much less painful than
landing on ones face on concrete! We encourage \texttt{Error:}s - it
means you are trying new things.

If you require help on any function, use the \texttt{help} function,
e.g. \texttt{help(plot)}. Because R users love being concise, this can
also be written as \texttt{?plot}. Feel free to use it at any point
you'd like more detail on a specific function (although R's help files
are famously cryptic for the un-initiated). Help on more general terms
can be found using the \texttt{??} symbol. To test this, try typing
\texttt{??regression}. For the most part, \emph{learning by doing} is a
good motto, so let's crack on and download some packages and then some
data.

\subsection{Prerequisites and
packages}\label{prerequisites-and-packages}

For this tutorial you need to install R, if you haven't already done so,
the latest version of which can be downloaded from
\href{http://cran.r-project.org/}{\url{http://cran.r-project.org/}}. A
number of R editors such as \href{http://www.rstudio.com/}{RStudio} can
be used to make R more user friendly, but these are not needed to
complete the tutorial.

R has a huge and growing number of spatial data packages. We recommend
taking a quick browse on R's main website:
\href{http://cran.r-project.org/web/views/Spatial.html}{\url{http://cran.r-project.org/web/views/Spatial.html}}.

The packages we will be using are \textbf{ggplot2}, \textbf{rgdal},
\textbf{rgeos}, \textbf{maptools}, \textbf{mapproj} and \textbf{ggmap}.
To test whether a package is installed, \textbf{ggplot2} for example,
enter \texttt{library(ggplot2)}. If you get an error message, it needs
to be installed: \texttt{install.packages("ggplot2")}. These will be
downloaded from CRAN (the Comprehensive R Archive Network); if you are
prompted to select a `mirror', select one that is close to your home. If
there is no output from R, this is good news: it means that the library
has already been installed on your computer. Install these packages now.

\section{Part II: Spatial data in R}\label{part-ii-spatial-data-in-r}

\subsection{Starting the tutorial}\label{starting-the-tutorial}

Now that we have taken a look at R's syntax and installed the necessary
packages, we can start looking at some real spatial data. This second
part introduces some spatial files that we will download from the
internet. Plotting and interrogating spatial objects are central spatial
data analysis in R, so we will focus on these elements in the next two
parts of the tutorial, before focussing on creating attractive maps in
Part IV.

\subsection{Downloading the data}\label{downloading-the-data}

Download the data for this tutorial now from :
\href{https://github.com/Robinlovelace/Creating-maps-in-R}{\url{https://github.com/Robinlovelace/Creating-maps-in-R}}.
Click on the ``Download ZIP'' button on the right hand side and once it
is downloaded unzip this to a new folder on your PC. Use the
\texttt{setwd} command to set the working directory to the folder where
the data is saved. If your username is ``username'' and you saved the
files into a folder called ``Creating-maps-in-R-master'' on your
Desktop, for example, you would type the following:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{setwd}\NormalTok{(}\StringTok{"C:/Users/username/Desktop/Creating-maps-in-R-master/"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

If you are working in RStudio, you can create a project that will
automatically set your working directory. To do this click ``Session''
from the top toolbar and select ``Set working directory \textgreater{}
choose directory''.

It is also worth taking a look at the input data in your file browser
before opening them in R, to get a feel for them. You could try opening
the file ``london\_sport.shp'', within the ``data'' folder of the
project, in a GIS program such as QGIS (which can be freely downloaded
from the internet), for example, to get a feel for it before loading it
into R. Also note that .shp files are composed of several files for each
object: you should be able to open ``london\_sport.dbf'' in a
spreadsheet program such as LibreOffice Calc. Once you've understood
something of this input data and where it lives, it's time to open it in
R.

\subsection{Loading the spatial data}\label{loading-the-spatial-data}

One of the most important steps in handling spatial data with R is the
ability to read in spatial data, such as
\href{http://en.wikipedia.org/wiki/Shapefile}{shapefiles} (a common
geographical file format). There are a number of ways to do this, the
most commonly used and versatile of which is \texttt{readOGR}. This
function, from the \textbf{rgdal} package, automatically extracts
information about the projection and the attributes of data.
\textbf{rgdal} is R's interface to the ``Geospatial Abstraction Library
(GDAL)'' which is used by other open source GIS packages such as QGIS
and enables R to handle a broader range of spatial data formats. If
you've not already \emph{installed} and loaded the rgdal package (as
described above for ggplot2) do so now:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal)}
\NormalTok{lnd_sport <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\StringTok{"london_sport"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "london_sport"
## with 33 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
\end{verbatim}

In the code above \texttt{dsn} stands for ``data source name'' and is an
\emph{argument} of the \emph{function} \texttt{readOGR}. Note that each
new argument is separated by a comma. The \texttt{dsn} argument in this
case is a \emph{character string} (indicated by quote marks like this
one \texttt{"}) that specifies the directory where the data files are
stored. R functions have a default order of arguments, so \texttt{dsn =}
does not actually need to be typed for the command to run. If the data
were stored in the current working directory, for example, one could use
\texttt{readOGR(".", "london\_sport")}. For clarity, it is good practice
to include argument names, such as \texttt{dsn} when learning new
functions and we continue this tradition below.

The next argument is another \emph{character string}: simply the name of
the file required. There is no need to add a file extension (e.g.
\texttt{.shp}) in this case.

The files beginning \texttt{london\_sport} in the \texttt{data/}
\href{https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/data}{directory}
contain the 2001 borough population and percentage participating in
sporting activities from the
\href{http://data.london.gov.uk/datastore/package/active-people-survey-kpi-data-borough}{active
people survey}. The boundary data is from the
\href{http://www.ordnancesurvey.co.uk/oswebsite/opendata/}{Ordnance
Survey}.

For information about how to load different types of spatial data, the
help documentation for \texttt{readOGR} is a good place to start. This
can be accessed from within R by typing \texttt{?readOGR}. For another
worked example, in which a GPS trace is loaded, please see Cheshire and
Lovelace (2014).

\subsection{Basic plotting}\label{basic-plotting}

We have now created a new spatial object called ``sport'' from the
``london\_sport'' shapefile. Spatial objects are made up of a number of
different \emph{slots}, mainly the attribute \emph{slot} and the
geometry \emph{slot}. The attribute \emph{slot} can be thought of as an
attribute table and the geometry \emph{slot} is where the spatial object
(and it's attributes) lie in space. Lets now analyse the sport object
with some basic commands:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{head}\NormalTok{(lnd_sport@data, }\DataTypeTok{n =} \DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##   ons_label                 name Partic_Per Pop_2001
## 0      00AF              Bromley       21.7   295535
## 1      00BD Richmond upon Thames       26.6   172330
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{mean}\NormalTok{(lnd_sport$Partic_Per)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1] 20.05
\end{verbatim}

Take a look at this output and notice the table format of the data and
the column names. There are two important symbols at work in the above
block of code: the \texttt{@} symbol in the first line of code is used
to refer to the attribute \emph{slot} of the object. The \texttt{\$}
symbol refers to a specific attribute (a variable with a column name) in
the \texttt{data} \emph{slot}, which was identified from the result of
running the first line of code. If you are using RStudio, test out the
auto-completion functionality by hitting \texttt{tab} before completing
the command - this can save you a lot of time in the long run.

The \texttt{head} function in the first line of the code above simply
means ``show the first few lines of data'', i.e.~the head. It's default
is to output the first 6 rows of the dataset (try simply
\texttt{head(lnd\_sport@data)}), but we can specify the number of lines
with \texttt{n = 2} after the comma. The second line of the code above
calculates the mean value of the variable \texttt{Partic\_Per} (sports
participation per 100 people) for each of the zones in the
\texttt{lnd\_sport} object. To explore \texttt{lnd\_sport} object
further, try typing \texttt{nrow(lnd\_sport)} and record how many zones
the dataset contains. You can also try \texttt{ncol(lnd\_sport)}.

Now we have seen something of the attribute \emph{slot} of the spatial
object, let us look at its \emph{geometry}, which describes where the
polygons are located in space:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd_sport) }\CommentTok{# not shown in tutorial - try it on your computer}
\end{Highlighting}
\end{Shaded}

\texttt{plot} is one of the most useful functions in R, as it changes
its behaviour depending on the input data (this is called
\emph{polymorphism} by computer scientists). Inputting another object
such as \texttt{plot(lnd\_sport@data)} will generate an entirely
different type of plot. Thus R is intelligent at guessing what you want
to do with the data you provide it with.

R has powerful subsetting capabilities that can be accessed very
concisely using square brackets, as shown in the following example:

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{#select rows from attribute slot of lnd_sport object, where sports participation is less than 15.}
\NormalTok{lnd_sport@data[lnd_sport$Partic_Per <}\StringTok{ }\DecValTok{15}\NormalTok{, ]}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##    ons_label           name Partic_Per Pop_2001
## 17      00AQ         Harrow       14.8   206822
## 21      00BB         Newham       13.1   243884
## 32      00AA City of London        9.1     7181
\end{verbatim}

The above line of code asked R to select rows from the
\texttt{lnd\_sport} object, where sports participation is lower than 15,
in this case rows 17, 21 and 32, which are Harrow, Newham and the city
centre respectively. The square brackets work as follows: anything
before the comma refers to the rows that will be selected, anything
after the comma refers to the number of columns that should be returned.
For example if the data frame had 1000 columns and you were only
interested in the first two columns you could specify \texttt{1:2} after
the comma. The ``:'' symbol simply means ``to'', i.e.~columns 1 to 2.
Try experimenting with the square brackets notation (e.g.~guess the
result of \texttt{lnd\_sport@data{[}1:2, 1:3{]}} and test it): it will
be useful.

So far we have been interrogating only the attribute \emph{slot}
(\texttt{@data}) of the \texttt{lnd\_sport} object, but the square
brackets can also be used to subset spatial objects, i.e.~the geometry
\emph{slot}. Using the same logic as before try to plot a subset of
zones with high sports participation.

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{#plot zones from sports object where sports participation is greater than 25.}
\KeywordTok{plot}\NormalTok{(lnd_sport[lnd_sport$Partic_Per >}\StringTok{ }\DecValTok{25}\NormalTok{, ]) }\CommentTok{# output not shown in tutorial}
\end{Highlighting}
\end{Shaded}

This is useful, but it would be great to see these sporty areas in
context. To do this, simply use the \texttt{add = TRUE} argument after
the initial plot. (\texttt{add = T} would also work, but we like to
spell things out in this tutorial for clarity). What does the
\texttt{col} argument refer to in the below block - it should be obvious
(see figure 2).

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd_sport)}
\KeywordTok{plot}\NormalTok{(lnd_sport[lnd_sport$Partic_Per >}\StringTok{ }\DecValTok{25}\NormalTok{,], }\DataTypeTok{col =} \StringTok{"blue"}\NormalTok{, }\DataTypeTok{add =} \OtherTok{TRUE}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-10.pdf}
\caption{Preliminary plot of London with areas of high sports
participation highlighted in blue}
\end{figure}

Congratulations! You have just interrogated and visualised a spatial
object: where are areas with high levels of sports participation in
London? The map tells us. Do not worry for now about the intricacies of
how this was achieved: you have learned vital basics of how R works as a
language; we will cover this in more detail in subsequent sections.

While we are on the topic of loading data, it is worth pointing out that
R can save and load data efficiently into its own data format
(\texttt{.RData}). Try \texttt{save(lnd\_sport, file = "sport.RData")}
and see what happens. If you type \texttt{rm(lnd\_sport)} (which removes
the object) and then \texttt{load("sport.RData")} you should see how
this works. \texttt{lnd\_sport} will disappear from the workspace and
then reappear.

\subsection{Attribute data}\label{attribute-data}

All shapefiles have both attribute table and geometry data. These are
automatically loaded with \texttt{readOGR}. The loaded attribute data
can be treated the same as an R
\href{http://www.statmethods.net/input/datatypes.html}{data frame}.

R deliberately hides the geometry of spatial data unless you print the
entire object (try typing \texttt{print(lnd\_sport)}). Let's take a look
at the headings of sport, using the following command:
\texttt{names(lnd\_sport)} Remember, the attribute data contained in
spatial objects are kept in a `slot' that can be accessed using the
\texttt{@} symbol: \texttt{lnd\_sport@data}. This is useful if you do
not wish to work with the spatial components of the data at all times.

Type \texttt{summary(lnd\_sport)} to get some additional information
about the data object. Spatial objects in R contain much additional
information:

\begin{verbatim}
summary(lnd_sport)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 503571.2 561941.1
## y 155850.8 200932.5
## Is projected: TRUE
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 ....]
\end{verbatim}

The above output tells us that \texttt{lnd\_sport} is a special spatial
class, in this case a \texttt{SpatialPolygonsDataFrame}, meaning it is
composed of various polygons, each of which has attributes. This is the
typical class of data found in administrative zones. The coordinates
tell us what the maximum and minimum x and y values are, for plotting.
Finally, we are told something of the coordinate reference system with
the \texttt{Is projected} and \texttt{proj4string} lines. In this case,
we have a projected system, which means it is a Cartesian reference
system, relative to some point on the surface of the Earth. We will
cover reprojecting data in the next part of the tutorial.

\section{Part III: Manipulating spatial
data}\label{part-iii-manipulating-spatial-data}

It is all very well being able to load and interrogate spatial data in
R, but to compete with modern GIS packages, R must also be able to
modify these spatial objects (see
`\href{https://github.com/Pakillo/R-GIS-tutorial}{using R as a GIS}'). R
has a wide range of very powerful functions for this, many of which
reside in additional packages alluded to in the introduction.

This course is introductory so only commonly required data manipulation
tasks, \emph{reprojecting} and \emph{joining/clipping} are covered here.
We will look at joining non-spatial data to our spatial object. We will
then cover spatial joins, whereby data is joined to other dataset based
on spatial location.

\subsection{Changing projection}\label{changing-projection}

Before undertaking spatial queries of an object, it is useful to know
the \emph{coordinate reference system} (CRS) it uses. You may have
noticed the word \texttt{proj4string} in the summary of the
\texttt{lnd\_sport} object above. This represents it CRS mathematically.
In some spatial data files, no CRS is specified or worse, and incorrect
CRS value is given. Provided the correct CRS is known, this can be
righted with a single line:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{proj4string}\NormalTok{(lnd_sport) <-}\StringTok{ }\KeywordTok{CRS}\NormalTok{(}\StringTok{"+init=epsg:27700"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

R issues a warning when changing the CRS in this way to ensure the user
knows that they are simply changing the CRS, not \emph{reprojecting} the
data. R uses \href{}{EPSG codes} to refer to different coordinate
reference systems. \texttt{27700} is the code for British National Grid.
A commonly used geographical (`lat/lon') CRS is `WGS84', whose EPSG code
is \texttt{4326}. The following code shows how to search the list of
available EPSG codes and create a new version of \texttt{lnd\_sport} in
WGS84:\footnote{Note: entering \texttt{projInfo()} will provide
  additional CRS options available from \textbf{rgdal}.}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{EPSG <-}\StringTok{ }\KeywordTok{make_EPSG}\NormalTok{() }\CommentTok{# create data frame of available EPSG codes}
\NormalTok{EPSG[}\KeywordTok{grepl}\NormalTok{(}\StringTok{"WGS 84$"}\NormalTok{, EPSG$note), ] }\CommentTok{# search for WGS 84 code }
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##     code     note                                prj4
## 249 4326 # WGS 84 +proj=longlat +datum=WGS84 +no_defs
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_sport_wgs84 <-}\StringTok{ }\KeywordTok{spTransform}\NormalTok{(lnd_sport, }\KeywordTok{CRS}\NormalTok{(}\StringTok{"+init=epsg:4326"}\NormalTok{)) }\CommentTok{# reproject}
\end{Highlighting}
\end{Shaded}

The above code uses the function \texttt{spTransform}, from the
\textbf{sp} package, to convert the \texttt{lnd\_sport} object into a
new form, with the Coordinate Reference System (CRS) specified as WGS84.
The different epsg codes are a bit of hassle to remember but you can
search for them at
\href{http://spatialreference.org/}{spatialreference.org}.

\subsection{Attribute joins}\label{attribute-joins}

Attribute joins are used to link additional pieces of information to our
polygons. in the \texttt{lnd\_sport} object, for example, we have 5
attribute variables - that can be found by typing
\texttt{names(lnd\_sport)}. But what happens when we want to add an
additional variable from an external data table? We will use the example
of recorded crimes by borough to demonstrate this.

To reaffirm our starting point, let's re-load the ``london\_sport''
shapefile as a new object and plot it. This is identical to the
\texttt{lnd\_sport} object in the first instance, but we will give it a
new name, in case we ever need to re-use \texttt{lnd\_sport}. We will
call this new object \texttt{lnd}, short for London:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal) }\CommentTok{# ensure rgdal is loaded}
\CommentTok{# Create new object called "lnd" from "london_sport" shapefile}
\NormalTok{lnd <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\StringTok{"london_sport"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## OGR data source with driver: ESRI Shapefile 
## Source: "data", layer: "london_sport"
## with 33 features and 4 fields
## Feature type: wkbPolygon with 2 dimensions
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd) }\CommentTok{# plot the lnd object}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-14.pdf}
\caption{Plot of London}
\end{figure}

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{nrow}\NormalTok{(lnd) }\CommentTok{# return the number of rows}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1] 33
\end{verbatim}

The non-spatial data we are going to join to the \texttt{lnd} object
contains records on crimes in London. This is stored in a
comma-delimited {[}(\texttt{.csv}){]} file called
``mps-recordedcrime-borough''. Viewing the
\href{https://raw.githubusercontent.com/Robinlovelace/Creating-maps-in-R/master/data/mps-recordedcrime-borough.csv}{file}
locally shows that each row representing a single reported crime. We are
going to use a function called \texttt{aggregate} to pre-process these
records, ready to join to our spatial \texttt{lnd} dataset. A new object
called \texttt{crime\_data} is created to store this data.

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Create and look at new crime_data object}
\NormalTok{crime_data <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"data/mps-recordedcrime-borough.csv"}\NormalTok{,}
  \DataTypeTok{fileEncoding =} \StringTok{"UCS-2LE"}\NormalTok{)}

\KeywordTok{head}\NormalTok{(crime_data) }\CommentTok{# display first 6 lines}
\KeywordTok{summary}\NormalTok{(crime_data$MajorText) }\CommentTok{# summary of crime type}

\CommentTok{# Extract "Theft & Handling" crimes and save}
\NormalTok{crime_theft <-}\StringTok{ }\NormalTok{crime_data[crime_data$MajorText ==}\StringTok{ "Theft & Handling"}\NormalTok{, ]}
\KeywordTok{head}\NormalTok{(crime_theft, }\DecValTok{2}\NormalTok{) }\CommentTok{# take a look at the result (replace 2 with 10 to see more rows)}

\CommentTok{# Calculate the sum of the crime count for each district and save result as a new object}
\NormalTok{crime_ag <-}\StringTok{ }\KeywordTok{aggregate}\NormalTok{(CrimeCount ~}\StringTok{ }\NormalTok{Spatial_DistrictName, }\DataTypeTok{FUN =} \NormalTok{sum,}
                     \DataTypeTok{data =} \NormalTok{crime_theft)}
\CommentTok{# Show the first two rows of the aggregated crime data}
\KeywordTok{head}\NormalTok{(crime_ag, }\DecValTok{2}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

There is a lot going on in the above block of code and you should not
expect to understand all of it upon first try: simply typing the
commands and thinking briefly about the outputs is all that is needed at
this stage to improve your intuitive understanding of R. It is worth
pointing out a few things that you may not have seen before that will
likely be useful in the future:

\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  In the first line of code the \texttt{fileEncoding} argument is used.
  This is rarely necessary, but in this case the file comes in a strange
  file format. 9 times out of ten you can omit this argument but it's
  worth knowing about.
\item
  The \texttt{which} function is used to select only those observations
  that meet a specific condition, in this case all crimes involving
  ``Theft and Handling''.
\item
  The \texttt{\textasciitilde{}} symbol means ``by'': we aggregated the
  \texttt{CrimeCount} variable by the district name.
\end{itemize}

Now that we have crime data at the borough level
(\texttt{Spatial\_DistrictName}), the challenge is to join it to the
\texttt{lnd} object. We will base our join on the
\texttt{Spatial\_DistrictName} variable from the \texttt{crime\_ag}
object and the \texttt{name} variable from the \texttt{lnd} object. It
is not always straight forward to join objects based on names as the
names do not always match. Let us see which names in the
\texttt{crime\_ag} object match the spatial data object, \texttt{lnd}:

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Compare the name column in lnd to Spatial_DistrictName column in crime_ag to see which rows match.}
\NormalTok{lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##  [1]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [12]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE
## [23]  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE  TRUE FALSE
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Return rows which do not match}
\NormalTok{lnd$name[!lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName]}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1] City of London
## 33 Levels: Barking and Dagenham Barnet Bexley Brent Bromley ... Westminster
\end{verbatim}

The first line of code above uses the \texttt{\%in\%} command to
identify which values in \texttt{lnd\$name} are also contained in the
names of the crime data. The results indicate that all but one of the
borough names matches. The second line of code tells us that it is City
of London, row 25, that is named differently in the crime data. Look at
the results (not shown here) on your computer.

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Discover the names of the names}
\KeywordTok{levels}\NormalTok{(crime_ag$Spatial_DistrictName)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##  [1] "Barking and Dagenham"   "Barnet"                
##  [3] "Bexley"                 "Brent"                 
##  [5] "Bromley"                "Camden"                
##  [7] "Croydon"                "Ealing"                
##  [9] "Enfield"                "Greenwich"             
## [11] "Hackney"                "Hammersmith and Fulham"
## [13] "Haringey"               "Harrow"                
## [15] "Havering"               "Hillingdon"            
## [17] "Hounslow"               "Islington"             
## [19] "Kensington and Chelsea" "Kingston upon Thames"  
## [21] "Lambeth"                "Lewisham"              
## [23] "Merton"                 "Newham"                
## [25] "NULL"                   "Redbridge"             
## [27] "Richmond upon Thames"   "Southwark"             
## [29] "Sutton"                 "Tower Hamlets"         
## [31] "Waltham Forest"         "Wandsworth"            
## [33] "Westminster"
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Rename row 25 in crime_ag to match row 25 in lnd, as suggested results form above}
\KeywordTok{levels}\NormalTok{(crime_ag$Spatial_DistrictName)[}\DecValTok{25}\NormalTok{] <-}
\StringTok{  }\KeywordTok{as.character}\NormalTok{(lnd$name[!lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName])}
\NormalTok{lnd$name %in%}\StringTok{ }\NormalTok{crime_ag$Spatial_DistrictName }\CommentTok{# now all columns match}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##  [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [15] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [29] TRUE TRUE TRUE TRUE TRUE
\end{verbatim}

The above code block first identified the row with the faulty name and
then renamed the level to match the \texttt{lnd} dataset. Note that we
could not rename the variable directly, as it is stored as a factor.

We are now ready to join the datasets. It is recommended to use the
\texttt{join} function in the \textbf{plyr} package but the
\texttt{merge} function could equally be used. Note that when we ask for
help for a function that is not loaded, nothing happens, indicating we
need to load it:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{help}\NormalTok{(join) }\CommentTok{# error flagged}
\KeywordTok{library}\NormalTok{(plyr)}
\KeywordTok{help}\NormalTok{(join) }\CommentTok{# should now be loaded}
\end{Highlighting}
\end{Shaded}

The documentation for join will be displayed if the plyr package is
loaded (if not, load or install and load it!). It requires all joining
variables to have the same name, so we will rename the variable to make
the join work:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{head}\NormalTok{(lnd$name)}
\KeywordTok{head}\NormalTok{(crime_ag$Spatial_DistrictName) }\CommentTok{# the variables to join}
\NormalTok{crime_ag <-}\StringTok{ }\KeywordTok{rename}\NormalTok{(crime_ag, }\DataTypeTok{replace =} \KeywordTok{c}\NormalTok{(}\StringTok{"Spatial_DistrictName"} \NormalTok{=}\StringTok{ "name"}\NormalTok{))}
\KeywordTok{head}\NormalTok{(}\KeywordTok{join}\NormalTok{(lnd@data, crime_ag)) }\CommentTok{# test it works}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## Joining by: name
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd@data <-}\StringTok{ }\KeywordTok{join}\NormalTok{(lnd@data, crime_ag)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## Joining by: name
\end{verbatim}

Take a look at the \texttt{lnd@data} object. You should see new
variables added, meaning the attribute join was successful.

\subsection{Clipping and spatial
joins}\label{clipping-and-spatial-joins}

In addition to joining by zone name, it is also possible to do
\href{http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html\#//00080000000q000000}{spatial
joins} in R. There are three main varieties: many-to-one, where the
values of many intersecting objects contribute to a new variable in the
main table, one-to-many, or one-to-one. Because boroughs in London are
quite large, we will conduct a many-to-one spatial join. We will be
using transport infrastructure points such as tube stations and
roundabouts as the spatial data to join, with the aim of finding out
about how many are found in each London borough.

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgdal)}
\CommentTok{#create new stations object using the "lnd-stns" shapefile.}
\NormalTok{stations <-}\StringTok{ }\KeywordTok{readOGR}\NormalTok{(}\DataTypeTok{dsn =} \StringTok{"data"}\NormalTok{, }\DataTypeTok{layer =} \StringTok{"lnd-stns"}\NormalTok{)}
\KeywordTok{proj4string}\NormalTok{(stations) }\CommentTok{# this is the full geographical detail.}
\KeywordTok{proj4string}\NormalTok{(lnd)}
\CommentTok{#return the bounding box of the stations object}
\KeywordTok{bbox}\NormalTok{(stations)}
\CommentTok{#return the bounding box of the lnd object}
\KeywordTok{bbox}\NormalTok{(lnd)}
\end{Highlighting}
\end{Shaded}

The above code loads the data correctly, but also shows that there are
problems with it: the Coordinate Reference System (CRS) of
\texttt{stations} differs from that of our \texttt{lnd} object. OSGB
1936 (or \href{http://spatialreference.org/ref/epsg/27700/}{EPSG 27700})
is the official CRS for the UK, so we will convert the object to this:

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# Create reprojected stations object}
\NormalTok{stations27700 <-}\StringTok{ }\KeywordTok{spTransform}\NormalTok{(stations, }\DataTypeTok{CRSobj =} \KeywordTok{CRS}\NormalTok{(}\KeywordTok{proj4string}\NormalTok{(lnd)))}
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations27700 }\CommentTok{# overwrite the stations object with stations27700}
\KeywordTok{rm}\NormalTok{(stations27700) }\CommentTok{# remove the stations27700 object to clear up}
\KeywordTok{plot}\NormalTok{(lnd) }\CommentTok{# plot London for context (see figure 4 below)}
\KeywordTok{points}\NormalTok{(stations) }\CommentTok{# overlay the station points}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-22.pdf}
\caption{Sampling and plotting stations}
\end{figure}

Now we can clearly see that the \texttt{stations} points overlay the
boroughs. The problem is that the spatial extent of \texttt{stations} is
great than that of \texttt{lnd}. We will take a spatially determined
subset of the stations object that fall inside greater London. This is
\emph{clipping}.

Two functions can be used to clip \texttt{stations} so that only those
falling within London boroughs are retained: \texttt{sp::over}, and
\texttt{rgeos::gIntersects} (the word preceding the \texttt{::} symbol
refers to the package which the function is from). Use \texttt{?}
followed by the function to get help on each. Whether
\texttt{gIntersects} of \texttt{over} is needed depends on the spatial
data classes being compared (Bivand et al. 2013).

In this tutorial we will use the \texttt{over} function as it is easiest
to use. In fact, it can be called just by using square brackets:

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations[lnd, ]}
\KeywordTok{plot}\NormalTok{(stations) }\CommentTok{# test the clip succeeded (see figure 5)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-23.pdf}
\caption{The clipped stations dataset}
\end{figure}

The above line of code says: ``output all \texttt{stations} within the
\texttt{lnd} object bounds''. This is an incredibly concise way of
clipping and has the added advantage of being consistent with R's syntax
for non-spatial clipping. To prove it worked, only stations within the
London boroughs appear in the plot.

\texttt{gIntersects} can achieve the same result, but with more lines of
code (see
\href{http://www.rpubs.com/RobinLovelace/11796}{www.rpubs.com/RobinLovelace}
for more on this) . It may seem confusing that two different functions
can be used to generate the same result. However, this is a common issue
in R; the question is finding the most appropriate solution.

In its less concise form (without use of square brackets), \texttt{over}
takes two main input arguments: the target layer (the layer to be
altered) and the source layer by which the target layer is to be
clipped. The output of \texttt{over} is a data frame of the same
dimensions as the original object (in this case \texttt{stations}),
except that the points which fall outside the zone of interest are set
to a value of \texttt{NA} (``no answer''). We can use this to make a
subset of the original polygons, remembering the square bracket notation
described in the first section. We create a new object, \texttt{sel}
(short for ``selection''), containing the indices of all relevant
polygons:

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{sel <-}\StringTok{ }\KeywordTok{over}\NormalTok{(stations, lnd)}
\NormalTok{stations <-}\StringTok{ }\NormalTok{stations[!}\KeywordTok{is.na}\NormalTok{(sel[,}\DecValTok{1}\NormalTok{]),]}
\end{Highlighting}
\end{Shaded}

Typing \texttt{summary(sel)} should provide insight into how this
worked: it is a dataframe with 1801 NA values, representing zones
outside of the London polygon. Note that the preceding two lines of code
is equivalent to the single line of code,
\texttt{stations \textless{}- stations{[}lnd, {]}}. The next section
demonstrates spatial aggregation, a more advanced version of spatial
subsetting.

\subsection{Spatial aggregation}\label{spatial-aggregation}

As with R's very terse code for spatial subsetting, the base function
\texttt{aggregate} (which provides summaries of variables based on some
grouping variable) also behaves differently when the inputs are spatial
objects.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{stations_agg <-}\StringTok{ }\KeywordTok{aggregate}\NormalTok{(}\DataTypeTok{x =} \NormalTok{stations[}\StringTok{"CODE"}\NormalTok{], }\DataTypeTok{by =} \NormalTok{lnd, }\DataTypeTok{FUN =} \NormalTok{length)}
\KeywordTok{head}\NormalTok{(stations_agg@data)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##   CODE
## 0   48
## 1   22
## 2   43
## 3   18
## 4   12
## 5   13
\end{verbatim}

The above code performs a number of steps in just one line:

\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  \texttt{aggregate} identifies which \texttt{lnd} polygon (borough)
  each \texttt{station} is located in and groups them accordingly. The
  use of the syntax \texttt{stations{[}"CODE"{]}} tells R that we are
  interested in the spatial data from \texttt{stations} and its
  \texttt{CODE} variable (any variable could have been used here as we
  are merely counting how many points exist).
\item
  It counts the number of \texttt{stations} points in each borough,
  using the function \texttt{length}.
\item
  A new spatial object is created, with the same geometry as
  \texttt{lnd}, and assigned the name \texttt{stations\_agg}, the count
  of stations.
\end{itemize}

It may seem confusing that the result of the aggregated function is a
new shape, not a list of numbers - this is because values are assigned
to the elements within the \texttt{lnd} object. To extract the raw count
data, one could enter \texttt{stations\_agg\$CODE}. This variable could
be added to the original \texttt{lnd} object as a new field, as follows:

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd$n_points <-}\StringTok{ }\NormalTok{stations_agg$CODE}
\end{Highlighting}
\end{Shaded}

As shown below, the spatial implementation of \texttt{aggregate} can
provide summary statistics of variables, as well as simple counts. In
this case we take the variable \texttt{NUMBER} and find its mean value
for the stations in each ward.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_n <-}\StringTok{ }\KeywordTok{aggregate}\NormalTok{(stations[}\StringTok{"NUMBER"}\NormalTok{] , }\DataTypeTok{by =} \NormalTok{lnd, }\DataTypeTok{FUN =} \NormalTok{mean)}
\end{Highlighting}
\end{Shaded}

For an optional advanced task, let us analyse and plot the result.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{q <-}\StringTok{ }\KeywordTok{cut}\NormalTok{(lnd_n$NUMBER, }\DataTypeTok{breaks=} \KeywordTok{c}\NormalTok{(}\KeywordTok{quantile}\NormalTok{(lnd_n$NUMBER)), }\DataTypeTok{include.lowest=}\NormalTok{T)}
\KeywordTok{summary}\NormalTok{(q)}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1.82e+04,1.94e+04] (1.94e+04,1.99e+04] (1.99e+04,2.05e+04] 
##                   9                   8                   8 
##  (2.05e+04,2.1e+04] 
##                   8
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{clr <-}\StringTok{ }\KeywordTok{as.character}\NormalTok{(}\KeywordTok{factor}\NormalTok{(q, }\DataTypeTok{labels =} \KeywordTok{paste0}\NormalTok{(}\StringTok{"grey"}\NormalTok{, }\KeywordTok{seq}\NormalTok{(}\DecValTok{20}\NormalTok{, }\DecValTok{80}\NormalTok{, }\DecValTok{20}\NormalTok{))))}
\KeywordTok{plot}\NormalTok{(lnd_n, }\DataTypeTok{col =} \NormalTok{clr) }\CommentTok{# plot (not shown in printed tutorial)}
\KeywordTok{legend}\NormalTok{(}\DataTypeTok{legend =} \KeywordTok{paste0}\NormalTok{(}\StringTok{"q"}\NormalTok{, }\DecValTok{1}\NormalTok{:}\DecValTok{4}\NormalTok{), }\DataTypeTok{fill =} \KeywordTok{paste0}\NormalTok{(}\StringTok{"grey"}\NormalTok{, }\KeywordTok{seq}\NormalTok{(}\DecValTok{20}\NormalTok{, }\DecValTok{80}\NormalTok{, }\DecValTok{20}\NormalTok{)), }\StringTok{"topright"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-29.pdf}
\caption{Choropleth map of mean values of stations in each borough}
\end{figure}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{areas <-}\StringTok{ }\KeywordTok{sapply}\NormalTok{(lnd_n@polygons, function(x) x@area)}
\end{Highlighting}
\end{Shaded}

This results in a simple choropleth map and a new vector containing the
area of each borough (the basis for figure 6). As an additional step,
try comparing the mean area of each borough with the mean value of
\texttt{stations} points within it:
\texttt{plot(lnd\_n\$NUMBER, areas)}.

\emph{Adding different symbols for tube stations and train stations}

Imagine that we want to now display all tube and train stations on top
of the previously created choropleth map. How would we do this? The
shape of points in R is determined by the \texttt{pch} argument, as
demonstrated by the result of entering the following code:
\texttt{plot(1:10, pch=1:10)}. To apply this knowledge to our map, we
could add the following code to the chunk added above (see figure 6):

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{levels}\NormalTok{(stations$LEGEND) }\CommentTok{# we want A roads and rapid transit stations (RTS)}
\NormalTok{sel <-}\StringTok{ }\KeywordTok{grepl}\NormalTok{(}\StringTok{"A Road Sing|Rapid"}\NormalTok{, stations$LEGEND) }\CommentTok{# selection for plotting }
\NormalTok{sym <-}\StringTok{ }\KeywordTok{as.integer}\NormalTok{(stations$LEGEND[sel]) }\CommentTok{# symbols}
\KeywordTok{points}\NormalTok{(stations[sel,], }\DataTypeTok{pch =} \NormalTok{sym)}
\KeywordTok{legend}\NormalTok{(}\DataTypeTok{legend =} \KeywordTok{c}\NormalTok{(}\StringTok{"A Road"}\NormalTok{, }\StringTok{"RTS"}\NormalTok{), }\StringTok{"bottomright"}\NormalTok{, }\DataTypeTok{pch =} \KeywordTok{unique}\NormalTok{(sym))}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## [1] "Railway Station"                           
## [2] "Rapid Transit Station"                     
## [3] "Roundabout, A Road Dual Carriageway"       
## [4] "Roundabout, A Road Single Carriageway"     
## [5] "Roundabout, B Road Dual Carriageway"       
## [6] "Roundabout, B Road Single Carriageway"     
## [7] "Roundabout, Minor Road over 4 metres wide" 
## [8] "Roundabout, Primary Route Dual Carriageway"
## [9] "Roundabout, Primary Route Single C'way"
\end{verbatim}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-31.pdf}
\caption{Symbol levels for train station types in London}
\end{figure}

In the above block of code, we first identified which types of transport
points are present in the map with \texttt{levels} (this command only
works on factor data, and tells us the unique names of the factors that
the vector can hold). Next we select a subset of \texttt{stations} using
a new command, \texttt{grepl}, to determine which points we want to
plot. Note that \texttt{grepl}'s first argument is a text string (hence
the quote marks) and that the second is a factor (try typing
\texttt{class(stations\$LEGEND)} to test this). \texttt{grepl} uses
\emph{regular expressions} to match whether each element in a vector of
text or factor names match the text pattern we want. In this case,
because we are only interested in roundabouts that are A roads and Rapid
Transit systems (RTS). Note the use of the vertical separator
\texttt{\textbar{}} to indicate that we want to match \texttt{LEGEND}
names that contain either ``A Road'' \emph{or} ``Rapid''. Based on the
positive matches (saved as \texttt{sel}, a vector of \texttt{TRUE} and
\texttt{FALSE} values), we subset the stations. Finally we plot these as
points, using the integer of their name to decide the symbol and add a
legend. (See the documentation of \texttt{?legend} for detail on the
complexities of legend creation in R's base graphics.)

This may seem a frustrating and un-intuitive way of altering map
graphics compared with something like QGIS. That's because it is! It may
not worth pulling too much hair out over R's base graphics because there
is another option. Please skip to Section IV if you're itching to see
this more intuitive alternative.

\subsection{Optional task: aggregation with
gIntersects}\label{optional-task-aggregation-with-gintersects}

As with clipping, we can also do spatial aggregation with the rgeos
package. In some ways, this method makes explicit the steps taken in
\texttt{aggregate} `under the hood'. The code is quite involved and
intimidating, so feel free to skip this stage. Working through and
thinking about it this alternative method may, however, yield dividends
if you intend to perform more sophisticated spatial analysis in R.

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(rgeos) }
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## rgeos version: 0.2-19, (SVN revision 394)
##  GEOS runtime version: 3.4.2-CAPI-1.8.2 r3921 
##  Polygon checking: TRUE
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{int <-}\StringTok{ }\KeywordTok{gIntersects}\NormalTok{(stations, lnd, }\DataTypeTok{byid =} \OtherTok{TRUE}\NormalTok{) }\CommentTok{# re-run the intersection query}
\KeywordTok{head}\NormalTok{(}\KeywordTok{apply}\NormalTok{(int, }\DataTypeTok{MARGIN =} \DecValTok{2}\NormalTok{, }\DataTypeTok{FUN =} \NormalTok{which))}
\NormalTok{b.indexes <-}\StringTok{ }\KeywordTok{which}\NormalTok{(int, }\DataTypeTok{arr.ind =} \OtherTok{TRUE}\NormalTok{) }\CommentTok{# indexes that intersect}
\KeywordTok{summary}\NormalTok{(b.indexes)}
\NormalTok{b.names <-}\StringTok{ }\NormalTok{lnd$name[b.indexes[, }\DecValTok{1}\NormalTok{]]}
\NormalTok{b.count <-}\StringTok{ }\KeywordTok{aggregate}\NormalTok{(b.indexes ~}\StringTok{ }\NormalTok{b.names, }\DataTypeTok{FUN =} \NormalTok{length)}
\KeywordTok{head}\NormalTok{(b.count)}
\end{Highlighting}
\end{Shaded}

The above code first extracts the index of the row (borough) for which
the corresponding column is true and then converts this into names. The
final object created, \texttt{b.count} contains the number of station
points in each zone. According to this, Barking and Dagenham should
contain 12 station points. It is important to check the output makes
sense at every stage with R, so let's check to see this is indeed the
case with a quick plot:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{plot}\NormalTok{(lnd[}\KeywordTok{grepl}\NormalTok{(}\StringTok{"Barking"}\NormalTok{, lnd$name),])}
\KeywordTok{points}\NormalTok{(stations)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-34.pdf}
\caption{Transport points in Barking and Dagenham}
\end{figure}

Now the fun part: count the points in the polygon and report back how
many there are!

We have now seen how to load, join and clip data. The second half of
this tutorial is concerned with \emph{visualisation} of the results. For
this, we will use \textbf{ggplot2} and begin by looking at how it
handles non-spatial data.

\section{Part IV: Map making with
ggplot2}\label{part-iv-map-making-with-ggplot2}

This third part introduces a slightly different method of creating plots
in R using the \href{http://ggplot2.org/}{ggplot2 package}, and explains
how it can make maps. The package is an implementation of the Grammar of
Graphics (Wilkinson 2005) - a general scheme for data visualisation that
breaks up graphs into semantic components such as scales and layers.
\textbf{ggplot2} can serve as a replacement for the base graphics in R
(the functions you have been plotting with today) and contains a number
of default options that match good visualisation practice.

The maps we produce will not be that meaningful - the focus here is on
sound visualisation with R and not sound analysis (obviously the value
of the former diminished in the absence of the latter!) Whilst the
instructions are step by step you are encouraged to deviate from them
(trying different colours for example) to get a better understanding of
what we are doing.

\textbf{ggplot2} is one of the best documented packages in R. The full
documentation for it can be found online and it is recommended you test
out the examples on your own machines and play with them:
\url{http://docs.ggplot2.org/current/} .

Good examples of graphs can also be found on the website
\href{http://www.cookbook-r.com/Graphs/}{cookbook-r.com}.

Load the package:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(ggplot2)}
\end{Highlighting}
\end{Shaded}

It is worth noting that the basic \texttt{plot()} function requires no
data preparation but additional effort in colour selection/adding the
map key etc. \texttt{qplot()} and \texttt{ggplot()} (from the
\textbf{ggplot2} package) require some additional steps to format the
spatial data but select colours and add keys etc. automatically. More on
this later.

As a first attempt with \textbf{ggplot2} we can create a scatter plot
with the attribute data in the \texttt{lnd} object created above. Type:

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{p <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(lnd@data, }\KeywordTok{aes}\NormalTok{(Partic_Per, Pop_2001))}
\end{Highlighting}
\end{Shaded}

What you have just done is set up a ggplot object where you say where
you want the input data to come from. \texttt{lnd@data} is actually a
data frame contained within the wider spatial object \texttt{lnd} (the
\texttt{@} enables you to access the attribute table of the shapefile).
The characters inside the \texttt{aes} argument refer to the parts of
that data frame you wish to use (the variables \texttt{Partic\_Per} and
\texttt{Pop\_2001}). This has to happen within the brackets of
\texttt{aes()}, which means, roughly speaking `aesthetics that vary'.

If you just type p and hit enter you get the error
\texttt{No layers in plot}. This is because you have not told ggplot
what you want to do with the data. We do this by adding so-called
``geoms'', in this case \texttt{geom\_point()}.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{p +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{()}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-37.pdf}
\caption{A simple ggplot}
\end{figure}

Within the brackets you can alter the nature of the points. Try
something like \texttt{p + geom\_point(colour = "red", size=2)} and
experiment.

If you want to scale the points by borough population and colour them by
sports participation this is also fairly easy by adding another
\texttt{aes()} argument.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{p +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour=}\NormalTok{Partic_Per, }\DataTypeTok{size=}\NormalTok{Pop_2001))}
\end{Highlighting}
\end{Shaded}

The real power of \textbf{ggplot2} lies in its ability to add layers to
a plot. In this case we can add text to the plot.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{p +}\StringTok{ }\KeywordTok{geom_point}\NormalTok{(}\KeywordTok{aes}\NormalTok{(}\DataTypeTok{colour =} \NormalTok{Partic_Per, }\DataTypeTok{size =} \NormalTok{Pop_2001)) +}\StringTok{ }\KeywordTok{geom_text}\NormalTok{(}\DataTypeTok{size =} \DecValTok{2}\NormalTok{, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{label =} \NormalTok{name))}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-39.pdf}
\caption{ggplot for text}
\end{figure}

This idea of layers (or geoms) is quite different from the standard plot
functions in R, but you will find that each of the functions does a lot
of clever stuff to make plotting much easier (see the documentation for
a full list).

The following steps will create a map to show the percentage of the
population in each London Borough who regularly participate in sports
activities.

\subsection{``Fortifying'' spatial objects for ggplot2
maps}\label{fortifying-spatial-objects-for-ggplot2-maps}

To get the shapefiles into a format that can be plotted we have to use
the \texttt{fortify()} function. Spatial objects in R have a number of
slots containing the various items of data (polygon geometry,
projection, attribute information) associated with a shapefile. Slots
can be thought of as shelves within the data object that contain the
different attributes. The ``polygons'' slot contains the geometry of the
polygons in the form of the XY coordinates used to draw the polygon
outline. The generic plot function can work out what to do with these,
\textbf{ggplot2} cannot. We therefore need to extract them as a data
frame. The fortify function was written specifically for this purpose.
For this to work, either \textbf{maptools} or \textbf{rgeos} packages
must be installed.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_f <-}\StringTok{ }\KeywordTok{fortify}\NormalTok{(lnd, }\DataTypeTok{region =} \StringTok{"ons_label"}\NormalTok{) }\CommentTok{# you may need to load maptools}
\end{Highlighting}
\end{Shaded}

This step has lost the attribute information associated with the lnd
object. We can add it back using the merge function (this performs a
data join). To find out how this function works look at the output of
typing \texttt{?merge}.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_f <-}\StringTok{ }\KeywordTok{merge}\NormalTok{(lnd_f, lnd@data, }\DataTypeTok{by.x =} \StringTok{"id"}\NormalTok{, }\DataTypeTok{by.y =} \StringTok{"ons_label"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

Take a look at the \texttt{sport.f} object to see its contents. You
should see a large data frame containing the latitude and longitude
(they are actually Easting and Northing as the data are in British
National Grid format) coordinates alongside the attribute information
associated with each London Borough. If you type \texttt{print(lnd\_f)}
you will see just how many coordinate pairs are required! To keep the
output to a minimum, take a peek at the object using just the
\texttt{head} command:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{head}\NormalTok{(lnd_f[, }\DecValTok{1}\NormalTok{:}\DecValTok{8}\NormalTok{])}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
##     id   long    lat order  hole piece  group           name
## 1 00AA 531027 181611     1 FALSE     1 00AA.1 City of London
## 2 00AA 531555 181659     2 FALSE     1 00AA.1 City of London
## 3 00AA 532136 182198     3 FALSE     1 00AA.1 City of London
## 4 00AA 532946 181895     4 FALSE     1 00AA.1 City of London
## 5 00AA 533411 182038     5 FALSE     1 00AA.1 City of London
## 6 00AA 533843 180794     6 FALSE     1 00AA.1 City of London
\end{verbatim}

It is now straightforward to produce a map using all the built in tools
(such as setting the breaks in the data) that \textbf{ggplot2} has to
offer. \texttt{coord\_equal()} is the equivalent of asp=T in regular
plots with R:

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{map <-}\StringTok{ }\KeywordTok{ggplot}\NormalTok{(lnd_f, }\KeywordTok{aes}\NormalTok{(long, lat, }\DataTypeTok{group =} \NormalTok{group, }\DataTypeTok{fill =} \NormalTok{Partic_Per)) +}
\StringTok{  }\KeywordTok{geom_polygon}\NormalTok{() +}
\StringTok{  }\KeywordTok{coord_equal}\NormalTok{() +}
\StringTok{  }\KeywordTok{labs}\NormalTok{(}\DataTypeTok{x =} \StringTok{"Easting (m)"}\NormalTok{, }\DataTypeTok{y =} \StringTok{"Northing (m)"}\NormalTok{, }\DataTypeTok{fill =} \StringTok{"% Sports}\CharTok{\textbackslash{}n}\StringTok{Participation"}\NormalTok{) +}
\StringTok{  }\KeywordTok{ggtitle}\NormalTok{(}\StringTok{"London Sports Participation"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

Now, just typing \texttt{map} should result in your first ggplot-made
map of London! There is a lot going on in the code above, so think about
it line by line: what have each of the elements of code above been
designed to do? Also note how the \texttt{aes()} components can be
combined into one set of brackets after \texttt{ggplot}, that has
relevance for all layers, rather than being broken into separate parts
as we did above. The different plot functions still know what to do with
these. The \texttt{group=group} points ggplot to the group column added
by \texttt{fortify()} and it identifies the groups of coordinates that
pertain to individual polygons (in this case London Boroughs).

The default colours are really nice but we may wish to produce the map
in black and white, which should produce a map like that shown below
(and try changing the colors):

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{map +}\StringTok{ }\KeywordTok{scale_fill_gradient}\NormalTok{(}\DataTypeTok{low =} \StringTok{"white"}\NormalTok{, }\DataTypeTok{high =} \StringTok{"black"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-44.pdf}
\caption{Greyscale map}
\end{figure}

Saving plot images is also easy. You just need to use \texttt{ggsave}
after each plot, e.g. \texttt{ggsave("my\_map.pdf")} will save the map
as a pdf, with default settings. For a larger map, you could try the
following:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{ggsave}\NormalTok{(}\StringTok{"large_plot.png"}\NormalTok{, }\DataTypeTok{scale =} \DecValTok{3}\NormalTok{, }\DataTypeTok{dpi =} \DecValTok{400}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\subsection{Adding base maps to ggplot2 with
ggmap}\label{adding-base-maps-to-ggplot2-with-ggmap}

\href{http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf}{ggmap}
is a package that uses the \textbf{ggplot2} syntax as a template to
create maps with image tiles taken from map servers such as Google and
\href{http://www.openstreetmap.org/}{OpenStreetMap}:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(ggmap) }\CommentTok{# you may have to use install.packages to install it first}
\end{Highlighting}
\end{Shaded}

The \texttt{lnd} object loaded previously is in British National Grid
but the ggmap image tiles are in WGS84. We therefore need to use the
\texttt{lnd\_sport\_wgs84} object created in the reprojection operation
earlier.

The first job is to calculate the bounding box (bb for short) of the
\texttt{lnd\_sport\_wgs84} object to identify the geographic extent of
the image tiles that we need.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{b <-}\StringTok{ }\KeywordTok{bbox}\NormalTok{(lnd_sport_wgs84)}
\NormalTok{b[}\DecValTok{1}\NormalTok{, ] <-}\StringTok{ }\NormalTok{(b[}\DecValTok{1}\NormalTok{, ] -}\StringTok{ }\KeywordTok{mean}\NormalTok{(b[}\DecValTok{1}\NormalTok{, ])) *}\StringTok{ }\FloatTok{1.05} \NormalTok{+}\StringTok{ }\KeywordTok{mean}\NormalTok{(b[}\DecValTok{1}\NormalTok{, ])}
\NormalTok{b[}\DecValTok{2}\NormalTok{, ] <-}\StringTok{ }\NormalTok{(b[}\DecValTok{2}\NormalTok{, ] -}\StringTok{ }\KeywordTok{mean}\NormalTok{(b[}\DecValTok{2}\NormalTok{, ])) *}\StringTok{ }\FloatTok{1.05} \NormalTok{+}\StringTok{ }\KeywordTok{mean}\NormalTok{(b[}\DecValTok{2}\NormalTok{, ])}
\CommentTok{# scale longitude and latitude (increase bb by 5% for plot)}
\CommentTok{# replace 1.05 with 1.xx for an xx% increase in the plot size}
\end{Highlighting}
\end{Shaded}

This is then fed into the \texttt{get\_map} function as the location
parameter. The syntax below contains 2 functions. \textbf{ggmap} is
required to produce the plot and provides the base map data.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_b1 <-}\StringTok{ }\KeywordTok{ggmap}\NormalTok{(}\KeywordTok{get_map}\NormalTok{(}\DataTypeTok{location =} \NormalTok{b)) }\CommentTok{# create basemap for london}
\end{Highlighting}
\end{Shaded}

In much the same way as we did above we can then layer the plot with
different geoms.

First fortify the \texttt{lnd\_sport\_wgs84} object and then merge with
the required attribute data (we already did this step to create the
\texttt{lnd\_f} object).

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_wgs84_f <-}\StringTok{ }\KeywordTok{fortify}\NormalTok{(lnd_sport_wgs84, }\DataTypeTok{region =} \StringTok{"ons_label"}\NormalTok{)}
\NormalTok{lnd_wgs84_f <-}\StringTok{ }\KeywordTok{merge}\NormalTok{(lnd_wgs84_f, lnd_sport_wgs84@data,}
                      \DataTypeTok{by.x =} \StringTok{"id"}\NormalTok{, }\DataTypeTok{by.y =} \StringTok{"ons_label"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

We can now overlay this on our base map.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_b1 +}
\StringTok{  }\KeywordTok{geom_polygon}\NormalTok{(}\DataTypeTok{data =} \NormalTok{lnd_wgs84_f,}
               \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{long, }\DataTypeTok{y =} \NormalTok{lat, }\DataTypeTok{group =} \NormalTok{group, }\DataTypeTok{fill =} \NormalTok{Partic_Per),}
               \DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

The code above contains a lot of parameters. Use the \textbf{ggplot2}
help pages to find out what they are. The resulting map looks okay, but
it would be improved with a simpler base map in black and white. A
design firm called stamen provide the tiles we need and they can be
brought into the plot with the \texttt{get\_map} function:

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# download basemap (use load("data/lnd_b2.RData") if you have no internet)}
\NormalTok{lnd_b2 <-}\StringTok{ }\KeywordTok{ggmap}\NormalTok{(}\KeywordTok{get_map}\NormalTok{(}\DataTypeTok{location =} \NormalTok{b, }\DataTypeTok{source =} \StringTok{"stamen"}\NormalTok{,}
          \DataTypeTok{maptype =} \StringTok{"toner"}\NormalTok{, }\DataTypeTok{crop =} \OtherTok{TRUE}\NormalTok{))}
\end{Highlighting}
\end{Shaded}

We can then produce the plot as before:

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(mapproj) }\CommentTok{# mapproj library needed - install.packages("mapproj")}
\end{Highlighting}
\end{Shaded}

\begin{verbatim}
## Loading required package: maps
\end{verbatim}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_b2 +}
\StringTok{  }\KeywordTok{geom_polygon}\NormalTok{(}\DataTypeTok{data =} \NormalTok{lnd_wgs84_f,}
               \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{long, }\DataTypeTok{y =} \NormalTok{lat, }\DataTypeTok{group =} \NormalTok{group, }\DataTypeTok{fill =} \NormalTok{Partic_Per),}
               \DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-54.pdf}
\caption{Basemap 2}
\end{figure}

Finally, to increase the detail of the base map, we can use
\texttt{get\_map}'s \texttt{zoom} argument (result not shown)

\begin{Shaded}
\begin{Highlighting}[]
\CommentTok{# download basemap (try load("data/lnd_b3.RData") if you lack internet)}
\NormalTok{lnd_b3 <-}\StringTok{ }\KeywordTok{ggmap}\NormalTok{(}\KeywordTok{get_map}\NormalTok{(}\DataTypeTok{location =} \NormalTok{b, }\DataTypeTok{source =} \StringTok{"stamen"}\NormalTok{,}
  \DataTypeTok{maptype =} \StringTok{"toner"}\NormalTok{, }\DataTypeTok{crop =} \OtherTok{TRUE}\NormalTok{, }\DataTypeTok{zoom =} \DecValTok{11}\NormalTok{))}
\end{Highlighting}
\end{Shaded}

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_b3 +}
\StringTok{  }\KeywordTok{geom_polygon}\NormalTok{(}\DataTypeTok{data =} \NormalTok{lnd_wgs84_f,}
               \KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{long, }\DataTypeTok{y =} \NormalTok{lat, }\DataTypeTok{group =} \NormalTok{group, }\DataTypeTok{fill =} \NormalTok{Partic_Per),}
               \DataTypeTok{alpha =} \FloatTok{0.5}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

\subsection{Advanced Task: Faceting for
Maps}\label{advanced-task-faceting-for-maps}

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{library}\NormalTok{(reshape2) }\CommentTok{# this may not be installed.}
\CommentTok{# If not install it, or skip the next two steps}
\end{Highlighting}
\end{Shaded}

Load the data - this shows historic population values between 1801 and
2001 for London, again from the London data store.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{london_data <-}\StringTok{ }\KeywordTok{read.csv}\NormalTok{(}\StringTok{"data/census-historic-population-borough.csv"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

``Melt'' the data so that the columns become rows.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_molten <-}\StringTok{ }\KeywordTok{melt}\NormalTok{(london_data, }\DataTypeTok{id =} \KeywordTok{c}\NormalTok{(}\StringTok{"Area.Code"}\NormalTok{, }\StringTok{"Area.Name"}\NormalTok{))}
\end{Highlighting}
\end{Shaded}

Merge the population data with the London borough geometry contained
within our \texttt{lnd\_f} object.

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_f <-}\StringTok{ }\KeywordTok{merge}\NormalTok{(lnd_f, lnd_molten, }\DataTypeTok{by.x =} \StringTok{"id"}\NormalTok{, }\DataTypeTok{by.y =} \StringTok{"Area.Code"}\NormalTok{)}
\end{Highlighting}
\end{Shaded}

Reorder this data (ordering is important for plots).

\begin{Shaded}
\begin{Highlighting}[]
\NormalTok{lnd_f <-}\StringTok{ }\NormalTok{lnd_f[}\KeywordTok{order}\NormalTok{(lnd_f$order), ]}
\end{Highlighting}
\end{Shaded}

We can now use faceting to produce one map per year (this may take a
little while to appear as displayed in figure 13).

\begin{Shaded}
\begin{Highlighting}[]
\KeywordTok{ggplot}\NormalTok{(}\DataTypeTok{data =} \NormalTok{lnd_f, }\KeywordTok{aes}\NormalTok{(}\DataTypeTok{x =} \NormalTok{long, }\DataTypeTok{y =} \NormalTok{lat, }\DataTypeTok{fill =} \NormalTok{value, }\DataTypeTok{group =} \NormalTok{group)) +}
\StringTok{  }\KeywordTok{geom_polygon}\NormalTok{() +}
\StringTok{  }\KeywordTok{geom_path}\NormalTok{(}\DataTypeTok{colour=}\StringTok{"grey"}\NormalTok{, }\DataTypeTok{lwd=}\FloatTok{0.1}\NormalTok{) +}
\StringTok{  }\KeywordTok{coord_equal}\NormalTok{() +}
\StringTok{  }\KeywordTok{facet_wrap}\NormalTok{(~}\StringTok{ }\NormalTok{variable)}
\end{Highlighting}
\end{Shaded}

\begin{figure}[htbp]
\centering
\includegraphics{./intro-spatial_files/figure-latex/unnamed-chunk-64.pdf}
\caption{Faceted map}
\end{figure}

Again there is a lot going on here so explore the documentation to make
sure you understand it. Try out different colour values as well.

Add a title and replace the axes names with ``easting'' and ``northing''
and save your map as a pdf.

\section{Part V: Taking spatial data analysis in R
further}\label{part-v-taking-spatial-data-analysis-in-r-further}

The skills taught in this tutorial are applicable to a very wide range
of situations, spatial or not. Often experimentation is the most
rewarding learning method, rather than just searching for the `best' way
of doing something (Kabakoff, 2011). We recommend you play around with
your data.

If you would like to learn more about R's spatial functionalities,
including more exercises on loading, saving and manipulating data, we
recommend a slightly longer and more advanced tutorial (Cheshire and
Lovelace, 2014). An up-to-date repository of this project, including
example data and all the code used, can be found on its GitHub page:
\href{https://github.com/geocomPP/sdvwR}{github.com/geocomPP/sdvwR}.
There are also a number of bonus `vignettes' associated with the present
tutorial. These can be found on the
\href{https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/vignettes}{vignettes
page} of the project's repository.

Another advanced tutorial is ``Using spatial data'', which has example
code and data that can be downloaded from the
\href{http://www.edii.uclm.es/~useR-2013//Tutorials/Bivand.html}{useR
2013 conference page}. Such lengthy tutorials are worth doing to think
about spatial data in R systematically, rather than seeing R as a
discrete collection of functions. In R the whole is greater than the sum
of its parts.

The supportive online communities surrounding large open source programs
such as R are one of their greatest assets, so we recommend you become
an active
``\href{http://blog.cleverelephant.ca/2013/10/being-open-source-citizen.html}{open
source citizen}'' rather than a passive consumer (Ramsey \& Dubovsky,
2013).

This does not necessarily mean writing a new package or contributing to
R's `Core Team' - it can simply involve helping others use R. We
therefore conclude the tutorial with a list of resources that will help
you further sharpen you R skills, find help and contribute to the
growing online R community:

\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  R's homepage hosts a wealth of
  \href{http://cran.r-project.org/manuals.html}{official} and
  \href{http://cran.r-project.org/other-docs.html}{contributed} guides.
\item
  Stack Exchange and GIS Stack Exchange groups - try searching for
  ``{[}R{]}''. If your issue has not been not been addressed yet, you
  could post a polite question.
\item
  R's \href{http://www.r-project.org/mail.html}{mailing lists} - the
  R-sig-geo list may be of particular interest here.
\end{itemize}

Books: despite the strength of R's online community, nothing beats a
physical book for concentrated learning. We would particularly recommend
the following:

\begin{itemize}
\itemsep1pt\parskip0pt\parsep0pt
\item
  ggplot2: elegant graphics for data analysis (Wickham 2009).
\item
  Bivand et al. (2013) Provide a dense and detailed overview of spatial
  data analysis.
\item
  Kabacoff (2011) is a more general R book; it has many fun worked
  examples.
\end{itemize}

\section{R quick reference}\label{r-quick-reference}

\texttt{\#}: comments all text until line end

\texttt{df \textless{}- data.frame(x = 1:9, y = (1:9)\^{}2}: create new
object of class \texttt{data.frame}, called df, and assign values

\texttt{help(plot)}: ask R for basic help on function, the same as
\texttt{?plot}. Replace \texttt{plot} with any function (e.g.
\texttt{spTransform}).

\texttt{library(ggplot2)}: load a package (replace \textbf{ggplot2} with
your package name)

\texttt{install.packages("ggplot2")}: install package - note quotation
marks

\texttt{setwd("C:/Users/username/Desktop/")}: set R's \emph{working
directory} (set it to your project's folder)

\texttt{nrow(df)}: count the number of rows in the object \texttt{df}

\texttt{summary(df)}: summary statistics of the object \texttt{df}

\texttt{head(df)}: display first 6 lines of object \texttt{df}

\texttt{plot(df)}: plot object \texttt{df}

\texttt{save(df, "C:/Users/username/Desktop/" )}: save df object to
specified location

\texttt{rm(df)}: remove the \texttt{df} object

\texttt{proj4string(df)}: query coordinate reference system of
\texttt{df} object

\texttt{spTransform(df, CRS("+init=epsg:4326")}: reproject \texttt{df}
object to WGS84

\section{Aknowledgements}\label{aknowledgements}

The tutorial was developed for a series of Short Courses funded by the
National Centre for Research Methods (NCRM), via the TALISMAN node (see
\href{http://www.geotalisman.org/}{geotalisman.org}). Thanks to the
\href{http://www.esrc.ac.uk/}{ESRC} for funding applied methods
research. Many thanks to Rachel Oldroyd and Alistair Leak who helped
demonstrate these materials on the NCRM short courses for which this
tutorial was developed. Amy O'Neill organised the course and encouraged
feedback from participants. The final thanks is to all users and
developers of open source software for making powerful tools such as R
accessible and enjoyable to use.

\section{References}\label{references}

Bivand, R. S., Pebesma, E. J., \& Rubio, V. G. (2008). Applied spatial
data: analysis with R. Springer.

Cheshire, J. \& Lovelace, R. (2014). Manipulating and visualizing
spatial data with R. Book chapter in Press.

Harris, R. (2012). A Short Introduction to R.
\href{http://www.social-statistics.org/}{social-statistics.org}.

Johnson, P. E. (2013). R Style. An Rchaeological Commentary. The
Comprehensive R Archive Network.

Kabacoff, R. (2011). R in Action. Manning Publications Co.

Ramsey, P., \& Dubovsky, D. (2013). Geospatial Software's Open Future.
GeoInformatics, 16(4).

Torfs and Brauer (2012). A (very) short Introduction to R. The
Comprehensive R Archive Network.

Wickham, H. (2009). ggplot2: elegant graphics for data analysis.
Springer.

Wickham, H. (in press). \href{http://adv-r.had.co.nz/}{Advanced R}. CRC
Press.

Wilkinson, L. (2005). The grammar of graphics. Springer.

\end{document}