intro-spatial.Rmd

---
title: "Introduction to visualising spatial data in R"
author: "Robin Lovelace (R.Lovelace@Leeds.ac.uk) and James Cheshire (james.cheshire@ucl.ac.uk)"
date: "July, 2014"
output:
  pdf_document:
    fig_cap: yes
    fig_height: 3.5
    fig_width: 4.5
    keep_tex: yes
    toc: yes
    highlight: pygments
---

\newpage

```{r, echo=FALSE}
# rob00x-at-gmail-o-com
# ~~~___~~The future ~
# ~~/__/~|~~ is low ~
# ~/~~|~~|_ energy.~
# robinlovelace.net
```


Part I: Introduction
========================================================

This tutorial is an introduction to spatial data in R and map making with
R's 'base' graphics and the popular graphics package **ggplot2**.
It assumes no prior knowledge of spatial data analysis but
prior understanding of the R command line would be beneficial.
For people new to R, we recommend working through an
'Introduction to R' type tutorial, such as
"A (very) short introduction to R"
([Torfs and Brauer, 2012](http://cran.r-project.org/doc/contrib/Torfs+Brauer-Short-R-Intro.pdf))
or the more geographically inclined "Short introduction to R"
([Harris, 2012](http://www.social-statistics.org/wp-content/uploads/2012/12/intro_to_R1.pdf)).

Building on such background material,
the following set of exercises is concerned with specific functions for spatial data
and visualisation. It is divided into five parts:

- Introduction, which provides a guide to R's syntax and preparing for the tutorial
- Spatial data in R, which describes basic spatial functions in R
- Manipulating spatial data, which includes changing projection, clipping and spatial joins
- Map making with **ggplot2**, a recent graphics package for producing beautiful maps quickly
- Taking spatial analysis in R further, a compilation of resources for furthering your skills

An up-to-date version of this tutorial is maintained at
[https://github.com/Robinlovelace/Creating-maps-in-R](https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial-rl.pdf). The source files used to create this tutorial, including
the input data can be downloaded as a
[zip file](https://github.com/Robinlovelace/Creating-maps-in-R/archive/master.zip),
as described below. The entire tutorial was written in
[RMarkdown](http://rmarkdown.rstudio.com/), which
allows R code to run as the document compiles, ensuring reproducibility.

Any suggested improvements or new
[vignettes](https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/vignettes) are welcome, via email
to Robin or by [forking](https://help.github.com/articles/fork-a-repo)
the [master version](https://github.com/Robinlovelace/Creating-maps-in-R/blob/master/intro-spatial.Rmd) of this document.

## Typographic conventions and getting help

The colourful syntax highlighting in this document is thanks to
[RMarkdown](http://rmarkdown.rstudio.com/).
We try to follow best practice in terms of style, roughly following
Google's style guide, an in-depth guide written by
[Johnson (2013)](http://cran.r-project.org/web/packages/rockchalk/vignettes/Rstyle.pdf)
and a [chapter](http://adv-r.had.co.nz/Style.html) from
[*Advanced R*](http://adv-r.had.co.nz/) (Wickham, in press).
It is a good idea to get into the habit of consistent and clear writing in
any language, and R is no exception. Adding comments to your code is also
good practice, so you remember at a later date what you've done, aiding the
learning process. There are two main ways of commenting code using the `#` symbol:
above a line of code or directly following it, as illustrated in the block of
code presented below, which should create figure 1
if typed correctly into the R command line.

```{r fig.cap="Basic plot of x and y"}
# Generate data
x <- 1:400
y <- sin(x / 10) * exp(x * -0.01)

plot(x, y) # plot the result
```

In the above code we first created a new *object* that we have called `x`.
Any name could have been used, like `x_bumkin`, but `x` is concise
and works just fine here. It is good practice to give your objects meaningful names.
Note `<-`, the directional "arrow" assignment symbol. This creates new objects.
We will be using this symbol a lot in the
tutorial.^[Tip: typing `Alt -` on the keyboard will create it in RStudio.
The equals sign `=` also works but is not recommended by R developers.]

To distinguish between prose and code, please be aware of the following typographic conventions: R code (e.g. `plot(x, y)`) is
written in a `monospace` font and package names (e.g. **rgdal**)
are written in **bold**. Blocks of code such as:

```{r}
c(1:3, 5)^2
```

are compiled in-line: the `##` indicates output from R. Sometimes
output from code in this tutorial is reduced to save space, so do
not be alarmed if R produces unexpected `warning` messages.
In a few cases we omit images to save space. This will be clear from
the comments.
Images in this document are small and low-quality enable portability of the
pdf. They should
display better on your computer screen and can be saved at any resolution.

The code presented here is not the only way to do things: we encourage you to
play to gain a deeper understanding of R.
Do not worry, you cannot 'break' anything using R and all the input data
can be re-loaded if things do go wrong. As with learning to skate, you learn
by falling and, getting an `Error:` message in R is much less
painful than landing on ones face on concrete! We encourage `Error:`s - it
means you are trying new things.

If you require help on any function, use the `help` function,
e.g. `help(plot)`. Because R users love being concise,
this can also be written as `?plot`. Feel free to use it
at any point you'd like more detail on a specific function
(although R's help files are famously cryptic for the un-initiated).
Help on more general terms can be found using the `??` symbol. To test this,
try typing `??regression`.
For the most part, *learning by doing* is a good motto, so let's crack
on and download some packages and then some data.

## Prerequisites and packages

For this tutorial you need to install R, if you haven't already done so, the latest version of which
can be downloaded from [http://cran.r-project.org/](http://cran.r-project.org/).
A number of R editors such as [RStudio](http://www.rstudio.com/)
can be used to make R more user friendly,
but these are not needed to complete the tutorial.

R has a huge and growing number of spatial data packages.
We recommend taking a quick browse on R's main website:
[http://cran.r-project.org/web/views/Spatial.html](http://cran.r-project.org/web/views/Spatial.html).

The packages we will be using are
**ggplot2**, **rgdal**, **rgeos**, **maptools**, **mapproj** and **ggmap**.
To test whether a package is installed, **ggplot2** for example, enter `library(ggplot2)`.
If you get an error message, it needs to be installed: `install.packages("ggplot2")`.
These will be downloaded from CRAN (the Comprehensive R Archive Network); if you are prompted
to select a 'mirror', select one that is close to your home.
If there is no output from R, this is good news: it means that the library
has already been installed on your computer.
Install these packages now.

# Part II: Spatial data in R

## Starting the tutorial

Now that we have taken a look at R's syntax and installed the
necessary packages,
we can start looking at some real spatial data. This second part introduces some
spatial files that we will download from the internet. Plotting
and interrogating spatial objects are central spatial data
analysis in R, so we will focus on these elements in the next two parts of the tutorial,
before focussing on creating attractive maps in Part IV.

## Downloading the data

Download the data for this tutorial now from :
[https://github.com/Robinlovelace/Creating-maps-in-R](https://github.com/Robinlovelace/Creating-maps-in-R).
Click on the "Download ZIP" button on the right hand side and once it is downloaded unzip this to a new folder on your PC.
Use the `setwd` command to set the working directory to the folder where the data is saved.
If your username is "username" and you saved the files into a
folder called "Creating-maps-in-R-master" on your Desktop, for example,
you would type the following:

```{r, eval= F}
setwd("C:/Users/username/Desktop/Creating-maps-in-R-master/")
```

If you are working in RStudio, you can create a project that will automatically
set your working directory.
To do this click "Session" from the top
toolbar and select "Set working directory > choose directory".

It is also worth taking a look at the input data in your file browser
before opening them in R, to get a feel for them. You could try opening the
file "london_sport.shp", within the "data" folder of the project, in a GIS program such as
QGIS (which can be freely downloaded from the internet),
for example, to get a feel for it before loading it into R.
Also note that .shp files are composed of several files for each object:
you should be able to open "london_sport.dbf" in a spreadsheet program such as
LibreOffice Calc. Once you've understood something of this input data
and where it lives, it's time to open it in R.

## Loading the spatial data

One of the most important steps in handling spatial data with R
is the ability to read in spatial data, such as
[shapefiles](http://en.wikipedia.org/wiki/Shapefile)
(a common geographical file format). There are a number of ways to do this,
the most commonly used and versatile of which is `readOGR`.
This function, from the **rgdal** package, automatically extracts information
about the projection and the attributes of data.
**rgdal** is R’s interface to the "Geospatial Abstraction Library (GDAL)"
which is used by other open source GIS packages such as QGIS and enables
R to handle a broader range of spatial data formats. If you've not already
*installed* and loaded the rgdal package (as described above for ggplot2) do so now:

```{r, message=FALSE}
library(rgdal)
lnd_sport <- readOGR(dsn = "data", "london_sport")
```

In the code above `dsn` stands for "data source name" and is an *argument* of the *function* `readOGR`. Note that each new argument is separated by a comma.
The `dsn` argument in this case is a *character string*
(indicated by quote marks like this one `"`) that
specifies the directory where the data files are stored.
R functions have a default order of arguments, so `dsn = ` does not
actually need to be typed for the command to run. If the data were stored in the
current working directory, for example, one could use `readOGR(".", "london_sport")`.
For clarity, it is good practice to include argument names,
such as `dsn` when learning new functions and we continue this tradition below.

The next argument is another *character string*:
simply the name of the file required.
There is no need to add a file extension (e.g. `.shp`) in this case.

The files beginning `london_sport` in the `data/`
[directory](https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/data)
contain the 2001 borough population and
percentage participating in sporting activities from the
[active people survey](http://data.london.gov.uk/datastore/package/active-people-survey-kpi-data-borough).
The boundary data is from the [Ordnance Survey](http://www.ordnancesurvey.co.uk/oswebsite/opendata/).

For information about how to load different types of spatial data,
the help documentation for `readOGR` is a good place to start. This can be accessed from
within R by typing `?readOGR`. For another worked example, in which a GPS trace is loaded,
please see Cheshire and Lovelace (2014).

## Basic plotting

We have now created a new spatial object called "sport" from the "london_sport" shapefile. Spatial objects are made up of a number of different *slots*, mainly the attribute *slot* and the geometry *slot*. The attribute *slot* can be thought of as an attribute table and the geometry *slot* is where the spatial object (and it's attributes) lie in space. Lets now analyse the sport object with some basic commands:

```{r}
head(lnd_sport@data, n = 2)
mean(lnd_sport$Partic_Per)
```

Take a look at this output and notice the table format of the data and the column names. There are two important symbols at work in the above block of code: the `@` symbol in the first line of code is used to refer to the attribute *slot* of the object. The `$` symbol refers to a specific attribute (a variable with a column name) in the `data` *slot*, which was identified from the result of running the first line of code. If you are using RStudio, test out the auto-completion functionality
by hitting `tab` before completing the command -
this can save you a lot of time in the long run.

The `head` function in the first line of the code above simply means "show the first few lines of data", i.e. the head. It's default is to output the first 6 rows of the dataset (try simply `head(lnd_sport@data)`),
but we can specify the number of lines with `n = 2` after the comma.
The second line of the code above calculates the mean value of the variable `Partic_Per` (sports participation per 100 people) for each of the zones in the `lnd_sport` object.
To explore `lnd_sport` object further, try typing `nrow(lnd_sport)` and record how many zones the dataset contains. You can also try `ncol(lnd_sport)`.

Now we have seen something of the attribute *slot* of the spatial object,
let us look at its *geometry*, which describes where
the polygons are located in space:


```{r, eval=FALSE}
plot(lnd_sport) # not shown in tutorial - try it on your computer
```

`plot` is one of the most useful functions in R, as it changes its behaviour
depending on the input data (this is called *polymorphism* by computer scientists).
Inputting another object such as `plot(lnd_sport@data)` will generate
an entirely different type of plot. Thus R is intelligent at guessing what you want to
do with the data you provide it with.

R has powerful subsetting capabilities that can be accessed very concisely using square brackets,
as shown in the following example:

```{r}
#select rows from attribute slot of lnd_sport object, where sports participation is less than 15.
lnd_sport@data[lnd_sport$Partic_Per < 15, ]
```

The above line of code asked R to select rows from the `lnd_sport` object, where sports participation is lower than 15,
in this case rows 17, 21 and 32, which are Harrow, Newham and the city centre respectively. The square brackets work as follows:
anything before the comma refers to the rows that will be selected, anything after
the comma refers to the number of columns that should be returned. For example if the data frame had 1000 columns and you were only interested in the first two columns you could specify `1:2` after the comma. The ":" symbol simply means "to", i.e. columns 1 to 2. Try experimenting with the square brackets notation
(e.g. guess the result of `lnd_sport@data[1:2, 1:3]` and test it): it will be useful.

So far we have been interrogating only the attribute *slot* (`@data`) of the `lnd_sport` object, but the square brackets can also be used to subset spatial objects, i.e. the geometry *slot*. Using the same logic as before try
to plot a subset of zones with high sports participation.

```{r, eval=FALSE}
#plot zones from sports object where sports participation is greater than 25.
plot(lnd_sport[lnd_sport$Partic_Per > 25, ]) # output not shown in tutorial
```

This is useful, but it would be great to see these sporty areas in context.
To do this, simply use the `add = TRUE` argument after the initial plot.
(`add = T` would also work, but we like to spell things out in this tutorial for clarity).
What does the `col` argument refer to in the below block - it should be obvious (see figure 2).

```{r fig.cap="Preliminary plot of London with areas of high sports participation highlighted in blue"}
plot(lnd_sport)
plot(lnd_sport[lnd_sport$Partic_Per > 25,], col = "blue", add = TRUE)
```

Congratulations! You have just interrogated and visualised a
spatial object: where are areas with high levels of sports
participation in London? The map tells us. Do not worry for now about
the intricacies of
how this was achieved: you have learned vital basics of how R works as a language;
we will cover this in more detail in subsequent sections.

While we are on the topic of loading data, it is worth pointing out
that R can save and load data efficiently into its own data format (`.RData`).
Try `save(lnd_sport, file = "sport.RData")` and see what happens.
If you type `rm(lnd_sport)` (which removes the object) and then `load("sport.RData")`
you should see how this works. `lnd_sport` will disappear from the workspace and then reappear.

## Attribute data

All shapefiles have both attribute table and geometry data. These are automatically loaded with
`readOGR`. The loaded attribute data can be treated the same as an R
[data frame](http://www.statmethods.net/input/datatypes.html).

R deliberately hides the geometry of spatial data unless you print
the entire object (try typing `print(lnd_sport)`).
Let's take a look at the headings of sport, using the following command: `names(lnd_sport)`
Remember, the attribute data contained in spatial objects are kept in a 'slot' that can be accessed using the `@` symbol: `lnd_sport@data`. This is useful if you do not wish to work with the spatial components of the data at all times.

Type `summary(lnd_sport)` to get some additional information about the
data object. Spatial objects in R contain
much additional information:

```
summary(lnd_sport)

## Object of class SpatialPolygonsDataFrame
## Coordinates:
## min max
## x 503571.2 561941.1
## y 155850.8 200932.5
## Is projected: TRUE
## proj4string :
## [+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 ....]
```

The above output tells us that `lnd_sport` is a special spatial class,
in this case a `SpatialPolygonsDataFrame`, meaning it is composed of
various polygons, each of which has attributes. This is the typical
class of data found in administrative zones. The coordinates tell
us what the maximum and minimum x and y values are, for plotting.
Finally, we are told something of the coordinate reference system
with the `Is projected` and `proj4string` lines.
In this case, we have a projected system, which means it is a
Cartesian reference system, relative to some point on the surface of the Earth.
We will cover reprojecting data in the next part of the tutorial.

# Part III: Manipulating spatial data

```{r, echo=FALSE}
# should be manipulating and plotting. TODO: talk about base graphics
```


It is all very well being able to load and interrogate spatial data
in R, but to compete with modern GIS packages, R must also be able
to modify these spatial objects
(see '[using R as a GIS](https://github.com/Pakillo/R-GIS-tutorial)').
R has a wide range of very powerful
functions for this, many of which reside in additional packages alluded
to in the introduction.

This course is introductory so only commonly required
data manipulation tasks, *reprojecting* and *joining/clipping* are covered here.
We will look at joining non-spatial
data to our spatial object. We will then cover spatial joins, whereby
data is joined to other dataset based on spatial location.

## Changing projection

Before undertaking spatial queries of an object, it is useful
to know the *coordinate reference system* (CRS) it uses.
You may have noticed the word `proj4string` in the
summary of the `lnd_sport` object above.
This represents it CRS mathematically.
In some spatial data files, no CRS is specified or
worse, and incorrect CRS value is given.
Provided the correct CRS is known, this can be righted with a single line:

```{r, warning=FALSE}
proj4string(lnd_sport) <- CRS("+init=epsg:27700")
```

R issues a warning when changing the CRS in this way to ensure the user
knows that they are simply changing the CRS, not *reprojecting* the data.
R uses [EPSG codes]() to refer to different coordinate reference systems.
`27700` is the code for British National Grid.
A commonly used geographical ('lat/lon')
CRS is 'WGS84', whose EPSG code is `4326`.
The following code shows how to search the list of available EPSG
codes and create a new version of `lnd_sport` in WGS84:^[Note:
entering `projInfo()` will provide additional CRS options
available from **rgdal**.]

```{r}
EPSG <- make_EPSG() # create data frame of available EPSG codes
EPSG[grepl("WGS 84$", EPSG$note), ] # search for WGS 84 code 
lnd_sport_wgs84 <- spTransform(lnd_sport, CRS("+init=epsg:4326")) # reproject
```

The above code uses the function `spTransform`, from the **sp** package,
to convert the `lnd_sport` object into a new form, with the Coordinate Reference System (CRS)
specified as WGS84.
The different epsg codes are a bit of hassle to remember but you can search for them at
[spatialreference.org](http://spatialreference.org/).

## Attribute joins

Attribute joins are used to link additional pieces of information to our polygons.
in the `lnd_sport` object, for example, we have 5 attribute variables - that can be
found by typing `names(lnd_sport)`. But what happens when we want to add an additional
variable from an external data table? We will use the example of recorded crimes by
borough to demonstrate this.

To reaffirm our starting point, let's re-load the
"london_sport" shapefile as a new object and plot it. This is identical to
the `lnd_sport` object in the first instance, but we will give it a new name,
in case we ever need to re-use `lnd_sport`.
We will call this new object
`lnd`, short for London:

```{r fig.cap="Plot of London"}
library(rgdal) # ensure rgdal is loaded
# Create new object called "lnd" from "london_sport" shapefile
lnd <- readOGR(dsn = "data", "london_sport")

plot(lnd) # plot the lnd object
nrow(lnd) # return the number of rows
```

```{r, eval=FALSE, echo=FALSE}
## Downloading additional data

# Because we are using borough-level data, and boroughs are official administrative
# zones, there is much data available at this level. We will use the example
# of crime data to illustrate this data availability, and join this with the current
# spatial dataset. As before, we can download and import the data from within R:
  
# download.file("http://data.london.gov.uk/datafiles/crime-community-safety/mps-
# recordedcrime-borough.csv", destfile = "mps-recordedcrime-borough.csv")
# uncomment and join the above code to download the data
crime_data <- read.csv("data/mps-recordedcrime-borough.csv")
head(crime_data)

# Initially, the `read.csv` may an error. If not the `head` command should show
# that the dataset has not loaded correctly. This was due to an unusual
# encoding used in the text file: hopefully you will not
# encounter this problem in your research, but it highlights the importance
# of checking the input data. To overcome this issue we
# can set the encoding manually, and continue.

# variant: markdown_github
```

The non-spatial data we are going to join to the `lnd` object
contains records on crimes in London. This is stored in a comma-delimited
[(`.csv`)] file called "mps-recordedcrime-borough".
Viewing the [file](https://raw.githubusercontent.com/Robinlovelace/Creating-maps-in-R/master/data/mps-recordedcrime-borough.csv)
locally shows that each row representing a single reported crime.
We are going to use a function called `aggregate`
to pre-process these records, ready to join to our spatial
`lnd` dataset. A new object called `crime_data` is created to store this data.

```{r, results='hide'}
# Create and look at new crime_data object
crime_data <- read.csv("data/mps-recordedcrime-borough.csv",
  fileEncoding = "UCS-2LE")

head(crime_data) # display first 6 lines
summary(crime_data$MajorText) # summary of crime type

# Extract "Theft & Handling" crimes and save
crime_theft <- crime_data[crime_data$MajorText == "Theft & Handling", ]
head(crime_theft, 2) # take a look at the result (replace 2 with 10 to see more rows)

# Calculate the sum of the crime count for each district and save result as a new object
crime_ag <- aggregate(CrimeCount ~ Spatial_DistrictName, FUN = sum,
                     data = crime_theft)
# Show the first two rows of the aggregated crime data
head(crime_ag, 2)
```

There is a lot going on in the above block of code and you should not expect
to understand all of it upon first try: simply typing the commands and thinking
briefly about the outputs is all that is needed at this stage to improve your
intuitive understanding of R. It is worth pointing out a few things
that you may not have seen before that will likely be useful in the future:

- In the first line of code the `fileEncoding` argument is used.
This is rarely necessary, but in this case the file comes in a strange file format.
9 times out of ten you can omit this argument but it's worth knowing about.
- The `which` function is used to select only those observations that
meet a specific condition, in this case all crimes involving "Theft and Handling".
- The
`~` symbol means "by": we aggregated the `CrimeCount` variable by the district name.

Now that we have crime data at the borough level (`Spatial_DistrictName`), the challenge is to join it to the `lnd` object. We will base our join on the `Spatial_DistrictName` variable from the `crime_ag` object and the `name` variable from the `lnd` object. It is not always straight forward to join objects based on names as the names do not always match. Let us see which names in the `crime_ag` object match the spatial data object, `lnd`:

```{r}
# Compare the name column in lnd to Spatial_DistrictName column in crime_ag to see which rows match.
lnd$name %in% crime_ag$Spatial_DistrictName
# Return rows which do not match
lnd$name[!lnd$name %in% crime_ag$Spatial_DistrictName]
```

The first line of code above uses the `%in%` command to
identify which values in `lnd$name` are also contained in the names of the
crime data. The results indicate that all but one of the borough names matches.
The second line of code tells us that it is City of London, row 25,
that is named differently in the
crime data. Look at the results (not shown here) on your computer.

```{r}
# Discover the names of the names
levels(crime_ag$Spatial_DistrictName)

# Rename row 25 in crime_ag to match row 25 in lnd, as suggested results form above
levels(crime_ag$Spatial_DistrictName)[25] <-
  as.character(lnd$name[!lnd$name %in% crime_ag$Spatial_DistrictName])
lnd$name %in% crime_ag$Spatial_DistrictName # now all columns match
```

The above code block first identified the row with the faulty name and
then renamed the level to match the `lnd` dataset. Note that we could not
rename the variable directly, as it is stored as a factor.

We are now ready to join the datasets. It is recommended to use
the `join` function in the **plyr** package but the `merge` function
could equally be used. Note that when we ask for help for a function
that is not loaded, nothing happens, indicating we need to load it:

```{r, results='hide'}
help(join) # error flagged
library(plyr)
help(join) # should now be loaded
```

The documentation for join will be displayed if the plyr package is loaded (if not,
load or install and load it!). It requires all joining variables to have the
same name, so we will rename the variable to make the join work:

```{r, results='hide'}
head(lnd$name)
head(crime_ag$Spatial_DistrictName) # the variables to join
crime_ag <- rename(crime_ag, replace = c("Spatial_DistrictName" = "name"))
head(join(lnd@data, crime_ag)) # test it works
lnd@data <- join(lnd@data, crime_ag)
```

Take a look at the `lnd@data` object. You should
see new variables added, meaning the attribute join
was successful.

## Clipping and spatial joins

In addition to joining by zone name, it is also possible to do
[spatial joins](http://help.arcgis.com/en/arcgisdesktop/10.0/help/index.html#//00080000000q000000)
in R. There are three main varieties: many-to-one, where
the values of many intersecting objects contribute to a new variable in
the main table, one-to-many, or one-to-one. Because boroughs in London
are quite large, we will conduct a many-to-one spatial join.
We will be using transport infrastructure points such as
tube stations and roundabouts as the spatial data to join,
with the aim of finding out about how many are found in each London borough.

```{r, results='hide'}
library(rgdal)
#create new stations object using the "lnd-stns" shapefile.
stations <- readOGR(dsn = "data", layer = "lnd-stns")
proj4string(stations) # this is the full geographical detail.
proj4string(lnd)
#return the bounding box of the stations object
bbox(stations)
#return the bounding box of the lnd object
bbox(lnd)
```

The above code loads the data correctly, but also shows that
there are problems with it: the Coordinate Reference System (CRS)
of `stations` differs from that of our `lnd` object.
OSGB 1936 (or [EPSG 27700](http://spatialreference.org/ref/epsg/27700/))
is the official CRS for the UK, so
we will convert the object to this:
 
```{r fig.cap="Sampling and plotting stations"}
# Create reprojected stations object
stations27700 <- spTransform(stations, CRSobj = CRS(proj4string(lnd)))
stations <- stations27700 # overwrite the stations object with stations27700
rm(stations27700) # remove the stations27700 object to clear up
plot(lnd) # plot London for context (see figure 4 below)
points(stations) # overlay the station points
```

Now we can clearly see that the `stations` points overlay the boroughs.
The problem is that the spatial extent of `stations` is
great than that of `lnd`.
We will take a spatially determined subset of the
stations object that fall inside greater London. This is *clipping*.

Two functions can be used to clip `stations`
so that only those falling within London boroughs are retained:
`sp::over`, and `rgeos::gIntersects` (the word preceding the `::` symbol refers to the package which the function is from).
Use `?` followed by the function to get help on each. Whether
`gIntersects` of `over` is needed depends
on the spatial data classes being compared (Bivand et al. 2013).

In this tutorial we will use the `over` function as it is easiest to use.
In fact, it can be called just by using square brackets:

```{r fig.cap="The clipped stations dataset"}
stations <- stations[lnd, ]
plot(stations) # test the clip succeeded (see figure 5)
```

```{r, echo=F,eval=FALSE}
save(lnd, file="data/lnd.RData")
save(stations, file="data/stations.RData")
```

The above line of code says: "output all `stations` within
the `lnd` object bounds". This is an incredibly concise way
of clipping and has the added advantage of being consistent
with R's syntax for non-spatial clipping.
To prove it worked, only stations within the London boroughs appear in the plot.

`gIntersects` can achieve the same result, but with more lines of code
(see [www.rpubs.com/RobinLovelace](http://www.rpubs.com/RobinLovelace/11796) for more on this) .
It may seem confusing that two different functions
can be used to generate the same result. However,
this is a common issue in R; the question
is finding the most appropriate solution.

In its less concise form (without use of square brackets),
`over` takes two main input arguments:
the target layer (the layer to be altered) and the
source layer by which the target layer is to be clipped.
The output of `over` is a data frame of the same
dimensions as the original object (in this case `stations`),
except that the points which fall outside the zone of interest are set to a value of `NA` ("no answer").
We can use this to make a subset of the original polygons,
remembering the square bracket notation described in the first section.
We create a new object, `sel` (short for "selection"),
containing the indices of all relevant polygons:

```{r, eval=FALSE}
sel <- over(stations, lnd)
stations <- stations[!is.na(sel[,1]),]
```

Typing `summary(sel)` should provide insight into how this
worked: it is a dataframe with 1801 NA values, representing
zones outside of the London polygon.
Note that the preceding two lines of code is equivalent to the
single line of code, `stations <- stations[lnd, ]`.
The next section demonstrates
spatial aggregation, a more advanced version of spatial subsetting.

## Spatial aggregation

As with R's very terse code for spatial subsetting, the base function
`aggregate` (which provides summaries of variables based on some grouping variable)
also behaves differently when the inputs are spatial objects.

```{r}
stations_agg <- aggregate(x = stations["CODE"], by = lnd, FUN = length)
head(stations_agg@data)
```

The above code performs a number of steps in just one line:

- `aggregate` identifies which `lnd` polygon (borough) each `station` is located in and groups them accordingly. The use of the syntax `stations["CODE"]` tells R that we are
interested in the spatial data from `stations` and its `CODE` variable (any variable
could have been used here as we are merely counting how many points exist).
- It counts the number of `stations` points in each borough, using the function `length`.
- A new spatial object is created, with the same geometry as `lnd`, and assigned the name `stations_agg`, the count of stations.

It may seem confusing that the result of the aggregated function is a new shape,
not a list of numbers - this is because values are assigned to the elements within
the `lnd` object. To extract the raw count data, one could enter `stations_agg$CODE`.
This variable could be added to the original `lnd` object as a new field, as follows:

```{r}
lnd$n_points <- stations_agg$CODE
```

As shown below, the spatial implementation of
`aggregate` can provide summary statistics of variables, as well as simple counts.
In this case we take the variable `NUMBER`
and find its mean value for the stations in each ward.

```{r}
lnd_n <- aggregate(stations["NUMBER"] , by = lnd, FUN = mean)
```

For an optional advanced task, let us analyse and plot the result.

```{r fig.cap="Choropleth map of mean values of stations in each borough"}
q <- cut(lnd_n$NUMBER, breaks= c(quantile(lnd_n$NUMBER)), include.lowest=T)
summary(q)
clr <- as.character(factor(q, labels = paste0("grey", seq(20, 80, 20))))
plot(lnd_n, col = clr) # plot (not shown in printed tutorial)
legend(legend = paste0("q", 1:4), fill = paste0("grey", seq(20, 80, 20)), "topright")
areas <- sapply(lnd_n@polygons, function(x) x@area)
```

This results in a simple choropleth map and a new vector containing the area of each
borough (the basis for figure 6). As an additional step, try comparing the mean
area of each borough with the
mean value of `stations` points within it: `plot(lnd_n$NUMBER, areas)`.

*Adding different symbols for tube stations and train stations*

Imagine that we want to now display all tube and train stations
on top of the previously created choropleth map. How would we do this?
The shape of points in R is determined by the `pch` argument, as demonstrated by the
result of entering the following code: `plot(1:10, pch=1:10)`.
To apply this knowledge to our map, we could add the following
code to the chunk added above (see figure 6):

```{r, eval=F}
levels(stations$LEGEND) # we want A roads and rapid transit stations (RTS)
sel <- grepl("A Road Sing|Rapid", stations$LEGEND) # selection for plotting 
sym <- as.integer(stations$LEGEND[sel]) # symbols
points(stations[sel,], pch = sym)
legend(legend = c("A Road", "RTS"), "bottomright", pch = unique(sym))
```

```{r fig.cap="Symbol levels for train station types in London", echo=FALSE}
q <- cut(lnd_n$NUMBER, breaks= c(quantile(lnd_n$NUMBER)), include.lowest=T)
clr <- as.character(factor(q, labels = paste0("grey", seq(20, 80, 20))))
plot(lnd_n, col = clr)
legend(legend = paste0("q", 1:4), fill = paste0("grey", seq(20, 80, 20)), "topright")
levels(stations$LEGEND)
sel <- grepl("A Road Sing|Rapid", stations$LEGEND) # selection for plotting 
sym <- as.integer(stations$LEGEND[sel]) # symbols
points(stations[sel,], pch = sym)
legend(legend = c("A Road", "RTS"), "bottomright", pch = unique(sym))
```

In the above block of code, we first identified which types of transport
points are present in the map with `levels` (this command only works on
factor data, and tells us the unique names of the factors that the vector can
hold). Next we select a subset of `stations` using a new command, `grepl`, to
determine which points we want to plot. Note that `grepl`'s first argument
is a text string (hence the quote marks) and that the second is a factor
(try typing `class(stations$LEGEND)` to test this).
`grepl` uses *regular expressions* to match whether each element in a vector
of text or factor names match the text pattern we want. In this case,
because we are only interested in roundabouts that are A roads and
Rapid Transit systems (RTS). Note the use of the vertical separator `|` to
indicate that we want to match `LEGEND` names that contain either "A Road"
*or* "Rapid". Based on the positive matches (saved as `sel`, a vector of
`TRUE` and `FALSE` values), we subset the stations. Finally we plot these as points,
using the integer of their name to decide the symbol and add a legend.
(See the documentation of `?legend` for detail on the complexities of
legend creation in R's base graphics.)

This may seem a frustrating and un-intuitive way of altering
map graphics compared with something like QGIS. That's because it is!
It may not worth pulling too
much hair out over R's base graphics because there is another
option. Please skip to Section IV if you're itching to see this
more intuitive alternative.

## Optional task: aggregation with gIntersects

```{r, echo=FALSE}
# This should be a separate vignette/rpubs doc
```

As with clipping, we can also do spatial aggregation with
the rgeos package. In some ways, this method makes explicit
the steps taken in `aggregate` 'under the hood'.
The code is quite involved and intimidating, so feel free to
skip this stage. Working through and thinking about it this alternative method may, however,
yield dividends if you intend to perform more sophisticated spatial analysis in R.

```{r, results='hide'}
library(rgeos) 
int <- gIntersects(stations, lnd, byid = TRUE) # re-run the intersection query
head(apply(int, MARGIN = 2, FUN = which))
b.indexes <- which(int, arr.ind = TRUE) # indexes that intersect
summary(b.indexes)
b.names <- lnd$name[b.indexes[, 1]]
b.count <- aggregate(b.indexes ~ b.names, FUN = length)
head(b.count)
```

The above code first extracts the index of the row (borough) for
which the corresponding column is true and then converts this into
names. The final object created, `b.count` contains the number of station
points in each zone. According to this, Barking and Dagenham should contain
12 station points. It is important to check the output makes sense at
every stage with R, so let's check to see this is indeed the case with
a quick plot:

```{r fig.cap="Transport points in Barking and Dagenham"}
plot(lnd[grepl("Barking", lnd$name),])
points(stations)
```

Now the fun part: count the points in the polygon and report back how many there are!

We have now seen how to load, join and clip data. The second half of this tutorial
is concerned with *visualisation* of the results. For this, we will use
**ggplot2** and begin by looking at how it handles non-spatial data.

# Part IV: Map making with ggplot2

This third part introduces a slightly
different method of creating plots in R using the
[ggplot2 package](http://ggplot2.org/), and explains how it can make maps.
The package is an implementation of the Grammar of Graphics (Wilkinson 2005) -
a general scheme for data visualisation that breaks up graphs
into semantic components such as scales and layers.
**ggplot2** can serve as a replacement for the base graphics in R (the functions you have been plotting with today) and contains a number of default options that match good visualisation practice.

The maps we produce will not be that meaningful -
the focus here is on sound visualisation with R and not sound analysis
(obviously the value of the former diminished in the absence of the latter!)
Whilst the instructions are step by step you are encouraged to deviate from them
(trying different colours for example) to get a better understanding
of what we are doing.

**ggplot2** is one of the best documented packages in R.
The full documentation for it can be found online and it is recommended you
test out the examples on your own machines and play with them:
http://docs.ggplot2.org/current/ .

Good examples of graphs can also be found on the website
[cookbook-r.com](http://www.cookbook-r.com/Graphs/).

Load the package:

```{r}
library(ggplot2)
```

It is worth noting that the basic `plot()` function requires no
data preparation but additional effort in colour selection/adding the map key etc.
`qplot()` and `ggplot()` (from the **ggplot2** package)
require some additional steps to format the spatial data but select
colours and add keys etc. automatically. More on this later.

As a first attempt with **ggplot2** we can create a scatter plot with the attribute data in the `lnd` object created above. Type:

```{r}
p <- ggplot(lnd@data, aes(Partic_Per, Pop_2001))
```

What you have just done is set up a ggplot object where
you say where you want the input data to come from.
`lnd@data` is actually a data frame contained within the
wider spatial object `lnd` (the `@` enables you to
access the attribute table of the shapefile). The characters inside the `aes` argument
refer to the parts of that data frame you wish to use (the variables `Partic_Per` and `Pop_2001`).
This has to happen within the brackets of `aes()`, which means,
roughly speaking 'aesthetics that vary'.

If you just type p and hit enter you get the error `No layers in plot`.
This is because you have not told ggplot what you want
to do with the data. We do this by adding so-called "geoms",
in this case `geom_point()`.

```{r fig.cap="A simple ggplot"}
p + geom_point()
```

Within the brackets you can alter the nature of the points. Try something like `p + geom_point(colour = "red", size=2)` and experiment.

If you want to scale the points by borough population and colour them by sports participation this is also fairly easy by adding another `aes()` argument.

```{r fig.cap="ggplot with aesthetics", eval=FALSE}
p + geom_point(aes(colour=Partic_Per, size=Pop_2001))
```

The real power of **ggplot2** lies in its ability to add layers to a plot. In this case we can add text to the plot.

```{r fig.cap="ggplot for text"}
p + geom_point(aes(colour = Partic_Per, size = Pop_2001)) + geom_text(size = 2, aes(label = name))
```

This idea of layers (or geoms) is quite different from the standard plot functions in R, but you will find that each of the functions does a lot of clever stuff to make plotting much easier (see the documentation for a full list).
 
The following steps will create a map to show the percentage of the population in each London Borough who regularly participate in sports activities.

## "Fortifying" spatial objects for ggplot2 maps

To get the shapefiles into a format that can be plotted we have to use the `fortify()` function. Spatial objects in R have a number of slots containing the various items of data (polygon geometry, projection, attribute information) associated with a shapefile. Slots can be thought of as shelves within the data object that contain the different attributes. The "polygons" slot contains the geometry of the polygons in the form of the XY coordinates used to draw the polygon outline. The generic plot function can work out what to do with these, **ggplot2** cannot. We therefore need to extract them as a data frame. The fortify function was written specifically for this purpose.
For this to work, either **maptools** or **rgeos** packages must be installed.

```{r, warning=FALSE}
lnd_f <- fortify(lnd, region = "ons_label") # you may need to load maptools
```

This step has lost the attribute information associated with the lnd object. We can add it back using the merge function (this performs a data join). To find out how this function works look at
the output of typing `?merge`.

```{r}
lnd_f <- merge(lnd_f, lnd@data, by.x = "id", by.y = "ons_label")
```

Take a look at the `sport.f` object to see its contents. You should see a large data frame containing the latitude and longitude (they are actually Easting and Northing as the data are in British National Grid format) coordinates alongside the attribute information associated with each London Borough. If you type `print(lnd_f)` you will see just how many coordinate pairs are required!
To keep the output to a minimum, take a peek at the object using just the `head` command:

```{r}
head(lnd_f[, 1:8])
```

It is now straightforward to produce a map using all the built in tools
(such as setting the breaks in the data) that **ggplot2** has to offer.
`coord_equal()` is the equivalent of asp=T in regular plots with R:

```{r fig.cap="Map of Lond Sports Participation"}
map <- ggplot(lnd_f, aes(long, lat, group = group, fill = Partic_Per)) +
  geom_polygon() +
  coord_equal() +
  labs(x = "Easting (m)", y = "Northing (m)", fill = "% Sports\nParticipation") +
  ggtitle("London Sports Participation")
```

Now, just typing `map` should result in your first ggplot-made map of London!
There is a lot going on in the code above, so think about it line by line:
what have each of the elements of code above been designed to do?
Also note how the `aes()` components can be combined into one set of brackets
after `ggplot`, that has relevance for all layers, rather than being
broken into separate parts as we did above.
The different plot functions still know what to do with these.
The `group=group` points ggplot to the group column added by
`fortify()` and it identifies the groups of coordinates that pertain
to individual polygons (in this case London Boroughs).

The default colours are really nice but we may wish to produce the map in black and white,
which should produce a map like that shown below (and try changing the colors):

```{r fig.cap="Greyscale map"}
map + scale_fill_gradient(low = "white", high = "black")
```

Saving plot images is also easy. You just need to use `ggsave` after each plot, e.g.
`ggsave("my_map.pdf")` will save the map as a pdf, with default settings. For
a larger map, you could try the following:

```{r, eval=FALSE}
ggsave("large_plot.png", scale = 3, dpi = 400)
```

## Adding base maps to ggplot2 with ggmap

[ggmap](http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf)
is a package that uses the **ggplot2** syntax as a
template to create maps with image tiles taken from map servers
such as Google and
[OpenStreetMap](http://www.openstreetmap.org/):

```{r}
library(ggmap) # you may have to use install.packages to install it first
```

The `lnd` object loaded previously is in British National Grid but the ggmap
image tiles are in WGS84. We therefore need to use the `lnd_sport_wgs84`
object created in the reprojection operation earlier.

The first job is to calculate the bounding box (bb for short) of the
`lnd_sport_wgs84` object to identify the geographic extent of the image tiles that we need.

```{r}
b <- bbox(lnd_sport_wgs84)
b[1, ] <- (b[1, ] - mean(b[1, ])) * 1.05 + mean(b[1, ])
b[2, ] <- (b[2, ] - mean(b[2, ])) * 1.05 + mean(b[2, ])
# scale longitude and latitude (increase bb by 5% for plot)
# replace 1.05 with 1.xx for an xx% increase in the plot size
```

This is then fed into the `get_map` function as the location parameter. The syntax below contains 2 functions. **ggmap** is required to produce the plot and provides the base map data.

```{r, message=FALSE, eval=FALSE}
lnd_b1 <- ggmap(get_map(location = b)) # create basemap for london
```

```{r, echo=FALSE, eval=FALSE}
# save(lnd_b1, file = "data/lnd_b1.RData")
load("data/lnd_b1.RData") # load saved map
```

In much the same way as we did above we can then layer the plot with different geoms.

First fortify the `lnd_sport_wgs84` object and then merge with the required attribute
data (we already did this step to create the `lnd_f` object).

```{r}
lnd_wgs84_f <- fortify(lnd_sport_wgs84, region = "ons_label")
lnd_wgs84_f <- merge(lnd_wgs84_f, lnd_sport_wgs84@data,
                      by.x = "id", by.y = "ons_label")
```

We can now overlay this on our base map.

```{r, eval=FALSE}
lnd_b1 +
  geom_polygon(data = lnd_wgs84_f,
               aes(x = long, y = lat, group = group, fill = Partic_Per),
               alpha = 0.5)
```

The code above contains a lot of parameters. Use the **ggplot2** help pages to find out what they are.
The resulting map looks okay, but it would be improved with a simpler base map in black and white.
A design firm called stamen provide the tiles we need and they can be brought into the
plot with the `get_map` function:

```{r, eval=FALSE}
# download basemap (use load("data/lnd_b2.RData") if you have no internet)
lnd_b2 <- ggmap(get_map(location = b, source = "stamen",
          maptype = "toner", crop = TRUE))
```

```{r, echo=FALSE}
# save(lnd_b2, file = "data/lnd_b2.RData")
load("data/lnd_b2.RData") # load saved map
```

We can then produce the plot as before:

```{r fig.cap="Basemap 2"}
library(mapproj) # mapproj library needed - install.packages("mapproj")

lnd_b2 +
  geom_polygon(data = lnd_wgs84_f,
               aes(x = long, y = lat, group = group, fill = Partic_Per),
               alpha = 0.5)
```

Finally, to increase the detail of the base map, we can use `get_map`'s `zoom` argument
(result not shown)

```{r, eval=FALSE}
# download basemap (try load("data/lnd_b3.RData") if you lack internet)
lnd_b3 <- ggmap(get_map(location = b, source = "stamen",
  maptype = "toner", crop = TRUE, zoom = 11))
```

```{r, eval=F, echo=FALSE}
# save(lnd_b3, file = "data/lnd_b3.RData")
# load("data/lnd_b3.RData") # load saved map
```

```{r fig.cap="Basemap 3", eval=FALSE}
lnd_b3 +
  geom_polygon(data = lnd_wgs84_f,
               aes(x = long, y = lat, group = group, fill = Partic_Per),
               alpha = 0.5)
```

## Advanced Task: Faceting for Maps

```{r}
library(reshape2) # this may not be installed.
# If not install it, or skip the next two steps
```

Load the data - this shows historic population values between 1801 and 2001
for London, again from the London data store.

```{r}
london_data <- read.csv("data/census-historic-population-borough.csv")
```

"Melt" the data so that the columns become rows.

```{r}
lnd_molten <- melt(london_data, id = c("Area.Code", "Area.Name"))
```


```{r, echo=FALSE, eval=FALSE}
# lnd_molten <- read.csv("london_data_melt.csv")
# Only do this step if reshape and melt failed
```

Merge the population data with the London borough geometry contained within our `lnd_f` object.

```{r}
lnd_f <- merge(lnd_f, lnd_molten, by.x = "id", by.y = "Area.Code")
```

Reorder this data (ordering is important for plots).

```{r}
lnd_f <- lnd_f[order(lnd_f$order), ]
```

We can now use faceting to produce one map per year
(this may take a little while to appear as displayed
in figure 13).

```{r fig.cap="Faceted map", fig.height=10, fig.width=6}
ggplot(data = lnd_f, aes(x = long, y = lat, fill = value, group = group)) +
  geom_polygon() +
  geom_path(colour="grey", lwd=0.1) +
  coord_equal() +
  facet_wrap(~ variable)
```

Again there is a lot going on here so explore the documentation to make sure you understand it.
Try out different colour values as well.

Add a title and replace the axes names with "easting" and
"northing" and save your map as a pdf.

# Part V: Taking spatial data analysis in R further

The skills taught in this tutorial are applicable to a very wide
range of situations, spatial or not. Often experimentation is the
most rewarding learning method, rather than just searching for the
'best' way of doing something (Kabakoff, 2011). We recommend you play around with your data.

If you would like to learn more about R's spatial functionalities,
including more exercises on loading, saving and manipulating data,
we recommend a slightly longer and
more advanced tutorial (Cheshire and Lovelace, 2014).
An up-to-date repository of this project,
including example data and all the code used,
can be found on its GitHub page:
[github.com/geocomPP/sdvwR](https://github.com/geocomPP/sdvwR).
There are also a number of bonus 'vignettes' associated with
the present tutorial. These can be found on the
[vignettes page](https://github.com/Robinlovelace/Creating-maps-in-R/tree/master/vignettes) of the project's repository.

Another advanced tutorial is "Using spatial data", which has example
code and data that can be downloaded from the
[useR 2013 conference page](http://www.edii.uclm.es/~useR-2013//Tutorials/Bivand.html).
Such lengthy tutorials are worth doing to think about spatial data in R systematically,
rather than seeing R as a discrete collection of functions. In R the whole is greater than
the sum of its parts.

The supportive online communities surrounding large open source programs such as R
are one of their greatest assets, so we recommend you become an active
"[open source citizen](http://blog.cleverelephant.ca/2013/10/being-open-source-citizen.html)" rather than a passive consumer (Ramsey & Dubovsky, 2013).

This does not necessarily mean writing a new package or contributing
to R's 'Core Team' - it can simply involve helping
others use R. We therefore conclude the tutorial with a list of resources
that will help you further sharpen you R skills, find help and contribute
to the growing online R community:

* R's homepage hosts a wealth of [official](http://cran.r-project.org/manuals.html) and [contributed](http://cran.r-project.org/other-docs.html) guides.
* Stack Exchange and GIS Stack Exchange groups - try searching for "[R]". If your issue has not been not been addressed yet, you could post a polite question.
* R's [mailing lists](http://www.r-project.org/mail.html) - the R-sig-geo list may be of particular interest here.

Books: despite the strength of R's online community, nothing beats a physical book for concentrated learning. We would particularly recommend the following:

 * ggplot2: elegant graphics for data analysis (Wickham 2009).
 * Bivand et al. (2013) Provide a dense and detailed overview of spatial
 data analysis.
 * Kabacoff (2011) is a more general R book; it has many fun worked examples.
 
# R quick reference

`#`: comments all text until line end

`df <- data.frame(x = 1:9, y = (1:9)^2`: create new object of class `data.frame`, called df, and assign values

`help(plot)`: ask R for basic help on function, the same as `?plot`. Replace `plot` with any function (e.g. `spTransform`).

`library(ggplot2)`: load a package (replace **ggplot2** with your package name)

`install.packages("ggplot2")`: install package - note quotation marks

`setwd("C:/Users/username/Desktop/")`: set R's *working directory* (set it to your project's folder)

`nrow(df)`: count the number of rows in the object `df`

`summary(df)`: summary statistics of the object `df`

`head(df)`: display first 6 lines of object `df`

`plot(df)`: plot object `df`

`save(df, "C:/Users/username/Desktop/" )`: save df object to specified location

`rm(df)`: remove the `df` object

`proj4string(df)`: query coordinate reference system of `df` object

`spTransform(df, CRS("+init=epsg:4326")`: reproject `df` object to WGS84

# Aknowledgements

The tutorial was developed for
a series of Short Courses funded by the National Centre for Research Methods (NCRM),
via the TALISMAN node (see [geotalisman.org](http://www.geotalisman.org/)).
Thanks to the [ESRC](http://www.esrc.ac.uk/) for funding applied methods research.
Many thanks to Rachel Oldroyd and Alistair Leak who helped demonstrate
these materials on the NCRM short courses for which
this tutorial was developed. Amy O'Neill organised the course
and encouraged feedback from participants.
The final thanks is to all users and developers of open source software
for making powerful tools such as R accessible and enjoyable to use.

# References

Bivand, R. S., Pebesma, E. J., & Rubio, V. G. (2008). Applied spatial data: analysis with R. Springer.

Cheshire, J. & Lovelace, R. (2014). Manipulating and visualizing spatial data with R. Book chapter in Press.

Harris, R. (2012). A Short Introduction to R.
[social-statistics.org](http://www.social-statistics.org/).

Johnson, P. E. (2013). R Style. An Rchaeological Commentary. The Comprehensive R Archive Network.

Kabacoff, R. (2011). R in Action. Manning Publications Co.

Ramsey, P., & Dubovsky, D. (2013). Geospatial Software's Open Future.
GeoInformatics, 16(4).

Torfs and Brauer (2012). A (very) short Introduction to R. The Comprehensive R Archive Network.

Wickham, H. (2009). ggplot2: elegant graphics for data analysis. Springer.

Wickham, H. (in press). [Advanced R](http://adv-r.had.co.nz/). CRC Press.

Wilkinson, L. (2005). The grammar of graphics. Springer.

```{r, echo=FALSE}
system("mv intro-spatial.pdf intro-spatial-rl.pdf") # change name
```