-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgeodata_workshop.Rmd
606 lines (435 loc) · 24.5 KB
/
geodata_workshop.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
---
title: "Working with Geo-Data"
subtitle: "NaWi-Workshop: Obtaining, linking and plotting geographic data"
author: "Markus Konrad"
date: "May 6, 2019"
output:
ioslides_presentation:
logo: slides_custom/wzbfirmierungquerschwarzmitschrift.png
css: slides_custom/custom.css
---
```{r, include=FALSE}
knitr::opts_chunk$set(R.options = list(max.print = 50))
set.seed(10042019)
```
## Today's schedule
1. Introduction
2. A short introduction to plotting with *ggplot2*
3. Hello world: Plotting points on a world map
4. Sources and file formats for geographic data
-- lunch break --
<ol start="5">
<li>A short introduction to data linkage with *dplyr*</li>
<li>Combining data and making a *choropleth map*</li>
<li>Geocoding and reverse geocoding via Google Maps API</li>
<li>Making your maps publication-ready</li>
</ol>
# Introduction
## About me {.build}
- working at WZB as Data Scientist in IT dept. since April 2016
- my job:
- data extraction and preparation (e.g. data retrieval from APIs, web scraping, data extraction from PDFs)
- quantitive text analysis (Topic modeling)
- interactive and static visualizations
- implementing experiments (esp. w/ oTree)
- designing and implementing research databases
- apparently giving workshops
contact: [email protected] | tel -555 | office D005
## Aims of today's workshop {.build}
- will show how to use R with *geo-spatial data*
- main focus on visualization; no spatial data analysis
- sources and file formats for geographic data
- combine your data with existing geographic data
Why use R for that?
- for simple things, you can use online tools like [datawrapper](https://www.datawrapper.de/)
- but: R gives you several advantages over these tools
- it's for free
- offers full flexibility
- also supports geo-spatial analysis
- no need to switch tools
# A short introduction to plotting with *ggplot2*
## Plotting with *ggplot2* {.build}
- *ggplot2*: versatile package to create elegant plots with R
- based on a theory for declaratively creating graphics called *"Grammar of Graphics"* (Wilkinson 2005)
- documentation on https://ggplot2.tidyverse.org/
<br>
- ggplot works by providing a dataset and mapping its variables to visual properties, e.g. variable `age` should be on the x-axis, `income` should be on the y-axis
- with the same principle, maps can be made, e.g.: variable `GDP` determines the color of a country
## Plotting with *ggplot2*: Example dataset {.build .smaller}
We'll have a look at the principles of ggplot2 in general, which we can later apply to making maps.
We'll make a plot for historical election results from four randomly selected states in the US (dataset `presidentialElections` from package *pscl*):
```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(pscl)
library(dplyr)
sampled_states <- presidentialElections %>%
distinct(state, south) %>%
group_by(south) %>% # group by south / not south
sample_n(2) %>% # take 2 from each group
ungroup() %>%
pull(state)
sampled_pres <- filter(presidentialElections, state %in% sampled_states)
sampled_pres
```
## Plotting with *ggplot2*: Step 1 {.build .smaller}
Step 1: Load ggplot2 and pass the dataset.
```{r, warning=FALSE, message=FALSE, fig.height=4, fig.align='center'}
library(ggplot2)
ggplot(sampled_pres)
```
## Plotting with *ggplot2*: Step 2 {.build .smaller}
Step 2: Specify an **aesthetic mapping** via `aes()`.<br>
This describes how variables of your data are mapped to visual properties: "year" is plotted on the x-axis and the share of votes for Democrats on the y-axis.
```{r, warning=FALSE, message=FALSE, fig.height=4, fig.align='center'}
ggplot(sampled_pres, aes(x = year, y = demVote))
```
## Plotting with *ggplot2*: Step 3 {.build .smaller}
Step 3: Adding a **geom**.<br>
geoms are geometric objects; they describe which graphical primitives to use (e.g. points in a scatter plot or bars in a bar plot). Notice that this plot looks wrong. We'll fix that next.
```{r, warning=FALSE, message=FALSE, fig.height=4, fig.align='center'}
ggplot(sampled_pres, aes(x = year, y = demVote)) + geom_line()
```
## Plotting with *ggplot2*: Step 4 {.build .smaller}
Step 4: Fixing the aesthetic mapping.<br>
The previous plot looked wrong because several y-values appeared for the same years and **they were all connected by a single line**: For each year, we have a y-value (vote share) for each state! We adjust the **aesthetic mapping** by making the line color dependent on `state`, hence we have a line (of different color) per state:
```{r, warning=FALSE, message=FALSE, fig.height=4, fig.align='center'}
ggplot(sampled_pres, aes(x = year, y = demVote, color = state)) + geom_line()
```
## Plotting with *ggplot2*: Step 5 {.build .smaller}
Step 5: Adding another layer.<br>
We can add more geometric objects by adding another `geom_` layer. For better readability, I put the layer functions on separate lines:
```{r, warning=FALSE, message=FALSE, fig.height=4, fig.align='center'}
ggplot(sampled_pres, aes(x = year, y = demVote, color = state)) +
geom_line() +
geom_point()
```
## Plotting with *ggplot2*: Datasets per layer {.build .smaller}
- when you pass a dataset and aesthetic mapping to `ggplot()` it used by all layers as default
- you can override it in each `geom_...` layer by setting `aes(...)` and `data = ...`
- each layer may actually use a different dataset; when plotting maps, this is often used (e.g. one layer with data for a "background map", another for points on it)
```{r, warning=FALSE, message=FALSE, fig.height=3, fig.align='center'}
random_dots <- data.frame(x = sample(sampled_pres$year, 10),
y = rnorm(10, mean = 50, sd = 10))
ggplot() +
geom_line(aes(x = year, y = demVote, color = state), data = sampled_pres) +
geom_point(aes(x = x, y = y), data = random_dots) # use a different dataset here
```
## Further customizations {.build}
Additionally, you can further change the appearance of your plot by:
- altering the **scales** (e.g. use a logarithmic scale, modify display of factors, etc.)
- defining **facets** → create *small multiples*, each showing a different subset of the data
- changing the overall appearance of the plot by adjusting its **theme** (e.g. change background color, rotate axis labels, etc.)
You combine all these steps with a `+`.
## Exercise 1
**1. Making a scatterplot (for those new to ggplot)**
- load `ggplot2` and inform yourself about the `msleep` dataset
- make a scatterplot with `bodywt` on x-axis, `sleep_total` on y-axis
- improve the plot by:
- transforming the x scale to log10 using `scale_x_log10()`
- making the dot color dependent on `vore`
# Hello world: Plotting points on a world map
## Coordinate reference systems in brief {.smaller .build}
We usually work with two-dimensional *vector data*: Points located within a *coordinate reference system (CRS)*.
```{r, message=FALSE, warning=FALSE, echo=FALSE, fig.width=6, fig.height=2.5, fig.align='center'}
library(ggplot2)
library(maps)
library(sf)
nyc <- st_sfc(st_point(c(-73.94, 40.6)), crs = 4326)
ggplot() + geom_sf(data = nyc, color = 'red', size = 3) +
geom_sf_text(data = nyc, label = "-73.94, 40.6", hjust = 0, vjust = 1, nudge_x = 3) +
coord_sf(xlim = c(-180, 180), ylim = c(-90, 90), expand = FALSE) +
geom_hline(yintercept = 0, color = 'green') + geom_vline(xintercept = 0, color = 'blue') +
xlab('longitude') + ylab('latitude') +
theme_bw()
```
- the point *-73.94°, 40.6°* is represented in the *WGS84* CRS: the world geodetic system (e.g. used by GPS)
- points describe location on a sphere¹; its components are given in degrees:
- *-73.94°* is the **longitude**: about 74° West from the <span style="color:blue">meridian</span>
- *40.6°* is the **latitude**: about 41° North from the <span style="color:green">equator</span>
<small>¹ WGS84 actually models the Earth as ellipsoid</small>
## Coordinate reference systems in brief {.smaller .build}
Let's put this point into context:
```{r, warning=FALSE, fig.width=6, fig.height=3, fig.align='center', echo=FALSE}
worldmap_data <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
ggplot() + geom_sf(data = worldmap_data) + geom_sf(data = nyc, color = 'red', size = 3) +
geom_sf_text(data = nyc, label = "-73.94, 40.6", hjust = 0, vjust = 1, nudge_x = 3) +
coord_sf(xlim = c(-180, 180), ylim = c(-90, 90), expand = FALSE) +
geom_hline(yintercept = 0, color = 'green') + geom_vline(xintercept = 0, color = 'blue') +
xlab('longitude') + ylab('latitude') +
theme_bw()
```
- points can be connected to form other geometric shapes such as *lines* or *polygons*
- points and shapes can represent numerous geographic entities: cities, buildings, countries, rivers, lakes, etc.
## Packages, packages, packages {.build}
We need to extend R in order to work with geographic data *(geo-data)* in R by installing these packages:
- *maps*: contains geo-data for national borders and administrative regions for several countries
- *rgeos* and *maptools*: Misc. functions for operations on geometries
- *sf*: *"Simple Features for R"* -- reading, writing and transforming spatial data
Install them if you haven't yet:
```{r, eval=FALSE}
install.packages(c('maps', 'maptools', 'sf', 'rgeos'))
```
## Simple features {.build .smaller}
Most packages for working with geo-data in R rely on the *Simple Features* standard (→ `sf` package).
*Simple features* describe how objects in the real world can be represented in computers in terms of their *spatial geometry*:
> Features have a *geometry* describing *where* on Earth the feature is located, and they have attributes, which describe other properties.<br>
> -- [`sf` package vignette](https://cran.r-project.org/web/packages/sf/vignettes/sf1.html#what_is_a_feature)
Examples:
<div style="width:50%; float:left;">
- geometry: point at *-73.94°, 40.6°*
- name: New York City
- population: 8.6 mio.
</div>
<div style="width:50%; float:left;">
- geometry: polygons with points at ...
- name: Italy
- most popular meal: pizza
</div>
<div style="clear:left">
A **spatial dataset** is a dataset that contains geo-data (in a `geometry` column) *and* attributes of that entity like population, poverty rate, etc.
</div>
## Making a world map {.build .smaller}
First, we load the packages that we need:
```{r, warning=FALSE, message=FALSE}
library(ggplot2)
library(maps)
library(sf)
```
## Making a world map {.build .smaller}
The function `map` can be used to load the "world" data. We need to convert it to a "simple features" object via `st_as_sf()`:
```{r}
worldmap_data <- st_as_sf(map('world', plot = FALSE, fill = TRUE))
head(worldmap_data, n = 3) # to have a look inside
```
- **spatial dataset** with a "header" giving some general information
- two columns: `geometry` (spatial data) and `ID` (attribute)
What do the numbers in the `geometry` column mean? What's their CRS?
## Making a world map {.smaller}
We are ready to plot the world map. Every "simple features" object can be plotted by adding a `geom_sf` layer:
```{r, fig.width=8}
ggplot() + geom_sf(data = worldmap_data)
```
## Putting points on the map {.smaller .build}
We have some cities along with their longitude (`lng`) and latitude (`lat`):
```{r, echo=FALSE}
some_cities <- data.frame(name = c('Berlin', 'New York', 'Sydney'),
lng = c(13.38, -73.94, 151.21),
lat = c(52.52, 40.6, -33.87))
some_cities
```
We add a "point geom" layer to our map:
```{r, fig.width=6, fig.height=3, fig.align='center'}
ggplot(some_cities) + # pass the cities data to plot
geom_sf(data = worldmap_data) + # layer 1: world countries
geom_point(aes(x = lng, y = lat), color = 'red') # layer 2: points at city coord.
```
## Adding labels next to the points {.smaller .build}
You may also add text labels for the cities. *ggplot2* provides `geom_text` and `geom_label` (draws box around text):
```{r, fig.width=6, fig.height=3, fig.align='center'}
ggplot(some_cities) + # pass the cities data to plot
geom_sf(data = worldmap_data) + # layer 1: world countries
geom_point(aes(x = lng, y = lat), color = 'red') + # layer 2: points
geom_label(aes(x = lng, y = lat, label = name), # layer 3: labels
hjust = 0, vjust = 1, nudge_x = 3) + # labels appear next to point
theme(axis.title = element_blank(),
axis.text = element_blank()) # disable the axis labels
```
## Exercises 2 and 3 {.even-smaller}
**Choose *one* of the following two tasks (or make both if you like):**
**2. (Birth-)places on a map**
- form a small group (3 to 4 people) and make a dataset with your birthplaces (or favourite travel destinations or whatever you like) and their geo-coordinates
- see handout on how to find out the geo-coordinates for any place
- your dataset should have three columns: longitude, latitude, label
- plot the points on a world map and add labels to them
- if your points are too near to each other on a global scale:
- consider filtering the "worldmap" dataset to contain only the country you need for plotting
- or: restrict the display window (see hints in handout)
**3. Visualizing WZB partner institutions on a world map**
- load the CSV file `wzb_partners_data.csv` from folder `3_wzb_partners` and the data for the world map
- remove the shape (i.e. the "feature") for Antarctica from the world map data
- make a plot, showing the partner institutions with a dot on the world map (optional: make the color dependent on the "type" column)
- optional: group and count institutions per city; show the city locations in Europe (restrict the display window -- see hints in handout) only as a dot, make the dot size dependent on number of institutions
# Sources and file formats for geo-data
## Administrative levels and identifiers {.smaller .build}
In order to ...
- **link** your data with existing information on geographic entities e.g. country GDP, city population, poverty rate in the neighboorhood
- **visualize** your data geographically
... you need to:
1. agree on **(administrative) level** for matching (e.g. country, state, municipality)
2. find **identifiers** to match the observations w/ respective geographic entities
```{r, echo=FALSE, message=FALSE, warning=FALSE, fig.height=3, fig.align='center'}
library(ggrepel)
nutsrg123 <- lapply(1:3, function(lvl) {
nutsrg <- read_sf(sprintf('data/nutsrg_%d.json', lvl))
nutsrg$level <- lvl
st_crs(nutsrg) <- 3857
nutsrg
})
nutsrg123 <- do.call(rbind, nutsrg123)
nutsrg123$level <- as.ordered(nutsrg123$level)
nuts_brugge <- nutsrg123[nutsrg123$id %in% c('BE2', 'BE25', 'BE251'),]
nuts_brugge <- cbind(nuts_brugge, st_coordinates(st_centroid(nuts_brugge$geometry)))
nuts_brugge$label <- paste0('Level ', nuts_brugge$level, ', ID=', nuts_brugge$id, '\n', nuts_brugge$na)
ggplot() +
geom_sf(size = 0.5, alpha = 0.25, data = nutsrg123) +
geom_sf(aes(fill = level), data = nuts_brugge) +
geom_label_repel(aes(label = label, x = X, y = Y, color = level),
data = nuts_brugge, size = 3, nudge_x = 2e5, nudge_y = 5e4) +
coord_sf(datum = NA, xlim = c(0, 10e5), ylim = c(64e5, 70e5)) +
scale_size_continuous(guide = 'none') +
scale_fill_discrete(guide = 'none') +
scale_color_discrete(guide = 'none') +
theme_void()
```
## Country names, country codes {.smaller .build}
- different names may refer to the same country (e.g. Germany / Federal Republic of Germany, BRD, etc.) → often not a good identifier
- ISO 3166-1 designates every country a two- or three-letter code (e.g. DE / DEU)
- often used in datasets (e.g. from the UN)
```{r, echo=FALSE, message=FALSE, warning=FALSE}
library(rnaturalearth)
library(dplyr)
world <- ne_countries(type = 'map_units', returnclass = 'sf', continent = 'europe')
world <- st_transform(world, 3035)
world <- st_crop(world, xmin = 0, xmax = 7e6, ymin = 0, ymax = 5e6)
labelsdf <- data.frame(label = sprintf('%s (%s)\nISO Codes: %s / %s',
world$name, world$formal_en, world$iso_a2, world$iso_a3), stringsAsFactors = FALSE)
#labelsdf$geometry <- st_centroid(world$geometry)
labels_coords <- as.data.frame(st_coordinates(st_centroid(world$geometry)))
labelsdf$x <- labels_coords$X
labelsdf$y <- labels_coords$Y
world$show_label <- sample(c(TRUE, FALSE), nrow(labelsdf), replace = TRUE, prob = c(0.15, 0.85)) & !is.na(world$iso_a2)
labelsdf <- labelsdf[world$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = world) +
geom_label_repel(aes(label = label, x = x, y = y), data = labelsdf, size = 3, min.segment.length = 0, alpha = 0.85) +
scale_fill_brewer(guide = "none", palette = 'Dark2') +
labs(title = 'Random sample of European countries w/ ISO codes',
caption = 'source: Natural Earth Data') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
```
## Finding NUTS in the EU {.smaller .build}
The *Nomenclature of Territorial Units for Statistics (NUTS)* divides the EU territory into regions at 3 different levels for socio-economic analyses of the regions.
Candidate countries and countries inside EFTA also have NUTS regions (Norway, Switzerland, Albania, Turkey, etc.).
<div class="fullfig">
<img src="figures/eurostat-nuts-overview.jpg" alt="NUTS levels" style="max-height:50%"><br>
<small>source: [Eurostat](https://ec.europa.eu/eurostat/web/nuts/background)</small>
</div>
## Finding NUTS in the EU {.smaller}
```{r, echo=FALSE, warning=FALSE}
nutsrg2 <- read_sf('data/nutsrg_2.json')
st_crs(nutsrg2) <- 3857
labelsdf <- data.frame(label = sprintf('%s: %s', nutsrg2$id, nutsrg2$na), stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(nutsrg2$geometry)))
labelsdf <- bind_cols(labelsdf, labels_coords)
nutsrg2$show_label <- rep(FALSE, nrow(nutsrg2))
nutsrg2$show_label[sample(1:nrow(nutsrg2), 3)] <- TRUE
labelsdf <- labelsdf[nutsrg2$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = nutsrg2) +
geom_label_repel(aes(label = label, x = X, y = Y), data = labelsdf, size = 3, min.segment.length = 0, alpha = 0.85) +
scale_fill_brewer(guide = "none", palette = 'Dark2') +
labs(title = 'Random sample of NUTS level 2 regions',
caption = 'source: Eurostat / https://github.com/eurostat/Nuts2json') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
```
## AGS in Germany {.smaller .build}
In Germany, regions are hierarchically structured and identified by *"Amtlicher Gemeindeschlüssel" (AGS ~ "municipality identificator")*. The [*Gemeindeverzeichnis*](https://www.destatis.de/DE/Themen/Laender-Regionen/Regionales/_inhalt.html) contains all regions with their name, AGS, area, population and other features.
```{r, echo=FALSE, warning=FALSE}
library(jsonlite)
ger_al6 <- read_sf('data/Germany_AL6.GeoJson')
ger_al6$ags <- sapply(ger_al6$alltags, function(tags) {
ags <- fromJSON(tags)[['de:amtlicher_gemeindeschluessel']]
if (is.null(ags)) {
NA
} else {
ags
}
}, USE.NAMES = FALSE)
ger_al6 <- select(ger_al6, localname, ags, geometry)
ger_al6 <- st_transform(ger_al6, 5243)
labelsdf <- data.frame(label = sprintf('%s: %s', ger_al6$ags, ger_al6$localname), stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(ger_al6$geometry)))
labelsdf <- bind_cols(labelsdf, labels_coords)
ger_al6$show_label <- rep(FALSE, nrow(ger_al6))
ger_al6$show_label[sample(1:nrow(ger_al6), 5)] <- TRUE
labelsdf <- labelsdf[ger_al6$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = ger_al6) +
geom_label_repel(aes(label = label, x = X, y = Y), data = labelsdf, size = 3, min.segment.length = 0, alpha = 0.85) +
scale_fill_brewer(guide = "none", palette = 'Dark2') +
labs(title = 'Random sample of AGS at\nadministrative level 6 ("Kreise")',
caption = 'source: OpenStreetMap') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
```
## Berlin: LOR codes {.smaller .build}
Since 2006, Berlin uses *"Lebensweltlich orientierte Räume" (LOR)* to structure the city area into sub-regions at different levels, each of them identified by "LOR" code:
Level 1: *Prognoseräume* (60 regions); Level 2: *Bezirksregionen* (138 regions); Level 3: *Planungsräume* (447 regions)
```{r, echo=FALSE, warning=FALSE}
bln_plan <- read_sf('data/Planungsraum_EPSG_25833.shp')
bln_plan$PLR_NAME <- iconv(bln_plan$PLR_NAME, from = 'LATIN1', to = 'UTF-8')
st_crs(bln_plan) <- 25833
labelsdf <- data.frame(label = sprintf('%s: %s', bln_plan$SCHLUESSEL, bln_plan$PLR_NAME), stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(bln_plan$geometry)))
labelsdf <- bind_cols(labelsdf, labels_coords)
bln_plan$show_label <- rep(FALSE, nrow(bln_plan))
bln_plan$show_label[sample(1:nrow(bln_plan), 5)] <- TRUE
labelsdf <- labelsdf[bln_plan$show_label, ]
ggplot() + geom_sf(aes(fill = show_label), data = bln_plan) +
geom_label_repel(aes(label = label, x = X, y = Y), data = labelsdf, size = 3, min.segment.length = 0, alpha = 0.85) +
scale_fill_brewer(guide = "none", palette = 'Dark2') +
labs(title = 'Random sample of LOR regions at\n"Planungsraum" level in Berlin',
caption = 'source: Berlin Senate Department for Urban Development and Housing') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
```
## What if I have no geo-identifier? {.smaller}
If your observations don't contain the identifiers you need, you can still do the following as long as you have **some information suitable for geo-coding** (e.g. address, city name, institution name to convert to a geo-coordinate):
1. find out the longitude/latitude coordinate for each observation
2. match each coordinate to its surrounding geographic entity (i.e. a municipality) by doing a "point-within-polygon" test
```{r, echo=FALSE, warning=FALSE, fig.height=4, fig.align='center'}
wzb_lon_lat <- c(13.365097, 52.506459)
pt_wzb_df <- data.frame(label = 'WZB')
pt_wzb_df$geometry <- st_transform(st_sfc(st_point(wzb_lon_lat), crs = 4326), 25833)
bln_plan$contains_wzb <- as.logical(st_contains(bln_plan$geometry, pt_wzb_df$geometry))
bln_plan$contains_wzb[is.na(bln_plan$contains_wzb)] <- FALSE
wzb_lor_region <- bln_plan[bln_plan$contains_wzb, c('SCHLUESSEL', 'PLR_NAME')]
ggplot() + geom_sf(aes(fill = contains_wzb), data = bln_plan) +
geom_sf(data = pt_wzb_df$geometry, color = 'white', size = 3) +
geom_sf_label(aes(label = label), data = pt_wzb_df, size = 3, alpha = 0.85, nudge_x = 100, nudge_y = -100, hjust = 0, vjust = 1) +
scale_fill_brewer(guide = "none", palette = 'Dark2') +
coord_sf(xlim = c(381e3, 399e3), ylim = c(5813e3, 5827e3), expand = FALSE) +
labs(title = sprintf('LOR region for WZB: %s\n(LOR code %s)', wzb_lor_region$PLR_NAME, wzb_lor_region$SCHLUESSEL),
subtitle = sprintf('Found for WZB geo-coordinates %.4f, %.4f', wzb_lon_lat[1], wzb_lon_lat[2]),
caption = 'source: Berlin Sen. Dept. for Urban Dev. and Housing') +
theme_bw() +
theme(axis.text = element_blank(),
axis.title = element_blank(),
axis.ticks = element_blank())
```
## Sources for geo-data
Have look in the handout document, where geo-data sources at world- and EU-level (NUTS) as well as sources for Germany (AGS) and Berlin (LOR) are presented.
## File formats {.smaller .build}
There are numerous file formats that store data for *geographic information system (GIS)* software. The most popular include:
- ESRI shapefile ("SHP"):
- consists of at least three files in the same directory with filename extensions `.shp`, `.shx` and `.dbf`
- GeoJSON (`.geojson`, `.json`) and TopoJSON (`.topojson`, `.json`)
- KML (`.kml`, `.kmz`): Keyhole Markup Language, used by Google Earth
- GML (`.gml`): Geography Markup Language
- AutoCAD DXF (`.dxf`)
You may also encounter other formats such as vector data files (`.svg`, `.poly`) or databases (`.sql`, `.sqlite`).
Good news: `sf` can handle all popular file formats.
## Exploring geo-data files with QGIS {.smaller}
[QGIS](https://www.qgis.org/) is a free, open-source GIS software running on all major operating systems.
- useful for exploring downloaded geo-data files
- I will show how to load a geo-data file and investigate properties of individual shapes
- if you've installed QGIS, you can follow along
<div class="fullfig">
<img src="figures/qgis-de-al6-bln.png" alt="QGIS screenshot" style="width:70%">
</div>