-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathREADME.qmd
232 lines (173 loc) · 11.1 KB
/
README.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
---
format: gfm
bibliography: references.bib
---
```{r eval=FALSE, echo=FALSE}
# Build the paper:
Rscript -e 'rmarkdown::render("README.Rmd")'
```
# odjitter
NOTE: This project is deprecated. Please use [od2net](https://github.com/Urban-Analytics-Technology-Platform/od2net) directly to generate route networks from OD data.
This repo contains the `odjitter` crate that implements a 'jittering' technique for pre-processing origin-destination (OD) data and an associated R interface package (see the [r](r/) subdirectory).
We hope to support other languages in the future (see [issue #23](https://github.com/dabreegster/odjitter/issues/23)).
## What is jittering?
Jittering is a method that takes OD data in a .csv file plus zones and geographic datasets representing trip start and end points in .geojson files and outputs geographic lines representing movement between the zones that can be stored as GeoJSON files.
The name comes from jittering in a [data visualisation context](https://ggplot2-book.org/layers.html?q=noise#position), which refers to the addition of random noise to the location of points, preventing them overlapping.
## Why jitter?
For a more detailed description of the method and an explanation of why it is useful, especially when modeling active modes that require dense active travel networks, see the paper [Jittering: A Computationally Efficient Method for Generating Realistic Route Networks from Origin-Destination Data](https://findingspress.org/article/33873-jittering-a-computationally-efficient-method-for-generating-realistic-route-networks-from-origin-destination-data) [@lovelace_jittering_2022b].
# Installation
Install the package from the system command line as follows (you need to have installed and set-up [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) first):
```bash
cargo install --git https://github.com/dabreegster/odjitter
```
To check the package installation worked, you can run `odjitter` command without arguments.
If it prints the following message congratulations, it works 🎉
```{r, engine='bash', error=TRUE}
odjitter
```
As shown in the output above the `odjitter` command line tools has subcommands: `disaggregate` and `jitter`.
The main difference between these commands is that `jitter` returns OD pairs representing multiple trips or fractions of a trip.
`disaggregate`, by contrast, returns data representing single trips.
## Docker
Alternatively, you can run through Docker: `docker run -t abstreet/odjitter <CMD>`. See below for command line usage, or start with `help`.
NOTE: There's no maintenance guarantee the Docker image has up-to-date changes from this repository. File an issue if you think the Docker version is out-of-date and you need something newer.
(For maintainers only: to build and push a new version, `docker build -t odjitter . && docker tag odjitter abstreet/odjitter:latest && docker push abstreet/odjitter:latest`.)
# `jitter` OD data
To jitter OD data you need a minimum of three inputs, examples of which are provided in the [`data/` folder](https://github.com/dabreegster/odjitter/tree/main/data) of this repo, the first few lines of which are illustrated below:
1. A [.csv file](https://github.com/dabreegster/odjitter/blob/main/data/od.csv) containing OD data with two columns containing zone IDs (specified with `--origin-key=geo_code1 --destination-key=geo_code2` by default) and other columns representing trip counts:
|geo_code1 |geo_code2 | all| from_home| train| bus| car_driver| car_passenger| bicycle| foot| other|
|:---------|:---------|---:|---------:|-----:|---:|----------:|-------------:|-------:|----:|-----:|
|S02001616 |S02001616 | 82| 0| 0| 3| 6| 0| 2| 71| 0|
|S02001616 |S02001620 | 188| 0| 0| 42| 26| 3| 11| 105| 1|
|S02001616 |S02001621 | 99| 0| 0| 13| 7| 3| 15| 61| 0|
2. A [.geojson file](https://github.com/dabreegster/odjitter/blob/main/data/zones.geojson) representing zones that contains values matching the zone IDs in the OD data (the field containing zone IDs is specified with `--zone-name-key=InterZone` by default):
```{bash}
head -6 data/zones.geojson
```
3. One or more [.geojson file](https://github.com/dabreegster/odjitter/blob/main/data/road_network.geojson) representing geographic entities (e.g. road networks) from which origin and destination points are sampled
```{bash}
head -6 data/road_network.geojson
```
The `jitter` command requires you to set the maximum number of trips for all trips in the jittered result, with the argument `disaggregation-threshold``.
A value of 1 will create a line for every trip in the dataset, a value above the maximum number of trips in the 'all' column in the OD data will result in a jittered dataset that has the same number of desire lines (the geographic representation of OD pairs) as in the input (50 in this case).
With reference to the test data in this repo, you can run the `jitter` command line tool as follows:
```{bash}
odjitter jitter --od-csv-path data/od.csv \
--zones-path data/zones.geojson \
--subpoints-origins-path data/road_network.geojson \
--subpoints-destinations-path data/road_network.geojson \
--disaggregation-threshold 50 \
--output-path data/output_max50.geojson
```
Try running it with a different `disaggregation-threshold` value (10 in the command below):
```{bash}
odjitter jitter --od-csv-path data/od.csv \
--zones-path data/zones.geojson \
--subpoints-origins-path data/road_network.geojson \
--subpoints-destinations-path data/road_network.geojson \
--disaggregation-threshold 10 \
--output-path data/output_max10.geojson
```
You can run odjitter on OD datasets in which the features in the origins are different from the features in the destinations, e.g. if you have data on movement between residential areas and parks.
However, you need to first combine the geographic dataset representing origins and the geographic destinations representing destinations into a single object.
An example of this type of this is is demonstrated in the code chunk below.
```{bash}
odjitter jitter --od-csv-path data/od_destinations.csv \
--zones-path data/zones_combined.geojson \
--subpoints-origins-path data/road_network.geojson \
--subpoints-destinations-path data/road_network.geojson \
--disaggregation-threshold 50 \
--output-path data/output_destinations_differ_50.geojson
```
# Outputs
The figure below shows the output of the `jitter` commands above visually, with the left image showing unjittered results with origins and destinations going to zone centroids (as in many if not most visualisations of desire lines between zones), the central image showing the result after setting `disaggregation-threshold` argument to 50, and the right hand figure showing the result after setting `disaggregation-threshold` to 10.
You can call the Rust code from R, as illustrated by the code below which generates the datasets shown in the figures below.
```{r, message=FALSE}
#| echo: true
remotes::install_github("dabreegster/odjitter", subdir = "r")
# Note: code to generate the visualisation below
od = readr::read_csv("data/od.csv")
zones = sf::read_sf("data/zones.geojson")
network = sf::read_sf("data/road_network.geojson")
od_sf = od::od_to_sf(od, zones)
odjittered_max_50 = odjitter::jitter(od, zones, network, disaggregation_threshold = 50)
odjittered_max_10 = odjitter::jitter(od, zones, network, disaggregation_threshold = 10)
```
```{r fig.width=8, fig.height=2, message=FALSE}
#| echo: false
#| label: thresholddemo
#| fig-cap: "Demonstration of the effect of the disaggregation threshold on the number of desire lines"
library(ggplot2)
odjittered_long = rbind(
od_sf |> dplyr::transmute(type = "Unjittered"),
odjittered_max_50 |> dplyr::transmute(type = "--disaggregation-threshold 50"),
odjittered_max_10 |> dplyr::transmute(type = "--disaggregation-threshold 10")
)
# Convert type to ordered factor so that it is plotted in the correct order:
odjittered_long$type = factor(odjittered_long$type, levels = c("Unjittered", "--disaggregation-threshold 50", "--disaggregation-threshold 10"))
odjittered_long |>
ggplot() +
geom_sf() +
geom_sf(data = zones, fill = NA, color = "grey") +
geom_sf(data = network, fill = NA, color = "red") +
facet_wrap(~type) +
theme_void()
```
Note: `odjitter` uses a random number generator to sample points, so the output will change each time you run it, unless you set the `rng-seed`, as documented in the next section.
The `subpoints-origins-path` and `subpoints-destinations-path` can be used to generate jittered desire lines that start from or go to particular points, defined in .geojson files.
We will demonstrate this on a simple imaginary example:
```{bash}
head data/od_schools.csv
```
Set the origin, destination, and threshold keys (to car meaning that the max n. car trips per OD pair is 10 in this case) as follows:
```{bash}
odjitter jitter --od-csv-path data/od_schools.csv \
--zones-path data/zones.geojson \
--origin-key origin \
--destination-key destination \
--subpoints-origins-path data/road_network.geojson \
--subpoints-destinations-path data/schools.geojson \
--disaggregation-key car \
--disaggregation-threshold 10 \
--output-path output_max10_schools.geojson
```
You can also set weights associated with each origin and destination in the input data.
The following example weights trips to schools proportional to the values in the 'weight' key for each imaginary data point represented in the `schools.geojson` object:
```{bash}
odjitter jitter --od-csv-path data/od_schools.csv \
--zones-path data/zones.geojson \
--origin-key origin \
--destination-key destination \
--subpoints-origins-path data/road_network.geojson \
--subpoints-destinations-path data/schools.geojson \
--disaggregation-key car \
--disaggregation-threshold 10 \
--weight-key-destinations weight \
--output-path output_max10_schools_with_weights.geojson
```
# `disaggregate` OD data
Sometimes it's useful to convert aggregate OD datasets into movement data at the trip level, with one record per trip or stage.
Microsumulation or agent-based modelling in transport simulation software such as [A/B Street](https://github.com/a-b-street/abstreet) is an example where disaggregate data may be needed.
The `disaggregate` command does this full disaggregation work, as demonstrated below.
```{bash}
odjitter disaggregate --od-csv-path data/od.csv \
--zones-path data/zones.geojson \
--output-path output_individual.geojson
```
```{bash}
head output_individual.geojson
rm output_individual.geojson
```
# Details
For full details on the arguments of each of `odjitter`'s subcommands can be viewed with the `--help` flag:
```{bash}
odjitter jitter --help
odjitter disaggregate --help
```
# Similar work
The technique is implemented in the function [`od_jitter()`](https://itsleeds.github.io/od/reference/od_jitter.html) from the R package [`od`](https://itsleeds.github.io/od/index.html).
The functionality contained in this repo is an extended and much faster implementation: according to our benchmarks on a large dataset it was around 1000 times faster than the R implementation.
# References
```{bash, echo=FALSE}
rm output_max*
```