README.qmd

---
format: gfm
bibliography: references.bib
---

```{r eval=FALSE, echo=FALSE}
# Build the paper:
Rscript -e 'rmarkdown::render("README.Rmd")'
```

# odjitter

NOTE: This project is deprecated. Please use [od2net](https://github.com/Urban-Analytics-Technology-Platform/od2net) directly to generate route networks from OD data.

This repo contains the `odjitter` crate that implements a 'jittering' technique for pre-processing origin-destination (OD) data and an associated R interface package (see the [r](r/) subdirectory).
We hope to support other languages in the future (see [issue #23](https://github.com/dabreegster/odjitter/issues/23)).

## What is jittering?

Jittering is a method that takes OD data in a .csv file plus zones and geographic datasets representing trip start and end points in .geojson files and outputs geographic lines representing movement between the zones that can be stored as GeoJSON files.
The name comes from jittering in a [data visualisation context](https://ggplot2-book.org/layers.html?q=noise#position), which refers to the addition of random noise to the location of points, preventing them overlapping.

## Why jitter?

For a more detailed description of the method and an explanation of why it is useful, especially when modeling active modes that require dense active travel networks, see the paper [Jittering: A Computationally Efficient Method for Generating Realistic Route Networks from Origin-Destination Data](https://findingspress.org/article/33873-jittering-a-computationally-efficient-method-for-generating-realistic-route-networks-from-origin-destination-data) [@lovelace_jittering_2022b].

# Installation

Install the package from the system command line as follows (you need to have installed and set-up [cargo](https://doc.rust-lang.org/cargo/getting-started/installation.html) first):

```bash
cargo install --git https://github.com/dabreegster/odjitter
```

To check the package installation worked, you can run `odjitter` command without arguments.
If it prints the following message congratulations, it works 🎉

```{r, engine='bash', error=TRUE}
odjitter
```

As shown in the output above the `odjitter` command line tools has subcommands: `disaggregate` and `jitter`.
The main difference between these commands is that `jitter` returns OD pairs representing multiple trips or fractions of a trip.
`disaggregate`, by contrast, returns data representing single trips.

## Docker

Alternatively, you can run through Docker: `docker run -t abstreet/odjitter <CMD>`. See below for command line usage, or start with `help`.

NOTE: There's no maintenance guarantee the Docker image has up-to-date changes from this repository. File an issue if you think the Docker version is out-of-date and you need something newer.

(For maintainers only: to build and push a new version, `docker build -t odjitter . && docker tag odjitter abstreet/odjitter:latest && docker push abstreet/odjitter:latest`.)

# `jitter` OD data

To jitter OD data you need a minimum of three inputs, examples of which are provided in the [`data/` folder](https://github.com/dabreegster/odjitter/tree/main/data) of this repo, the first few lines of which are illustrated below:

1. A [.csv file](https://github.com/dabreegster/odjitter/blob/main/data/od.csv) containing OD data with two columns containing zone IDs (specified with  `--origin-key=geo_code1 --destination-key=geo_code2` by default) and other columns representing trip counts:


|geo_code1 |geo_code2 | all| from_home| train| bus| car_driver| car_passenger| bicycle| foot| other|
|:---------|:---------|---:|---------:|-----:|---:|----------:|-------------:|-------:|----:|-----:|
|S02001616 |S02001616 |  82|         0|     0|   3|          6|             0|       2|   71|     0|
|S02001616 |S02001620 | 188|         0|     0|  42|         26|             3|      11|  105|     1|
|S02001616 |S02001621 |  99|         0|     0|  13|          7|             3|      15|   61|     0|

2. A [.geojson file](https://github.com/dabreegster/odjitter/blob/main/data/zones.geojson) representing zones that contains values matching the zone IDs in the OD data (the field containing zone IDs is specified with `--zone-name-key=InterZone` by default):

```{bash}
head -6 data/zones.geojson
```

3. One or more [.geojson file](https://github.com/dabreegster/odjitter/blob/main/data/road_network.geojson) representing geographic entities (e.g. road networks) from which origin and destination points are sampled

```{bash}
head -6 data/road_network.geojson
```

The `jitter` command requires you to set the maximum number of trips for all trips in the jittered result, with the argument `disaggregation-threshold``.
A value of 1 will create a line for every trip in the dataset, a value above the maximum number of trips in the 'all' column in the OD data will result in a jittered dataset that has the same number of desire lines (the geographic representation of OD pairs) as in the input (50 in this case).

With reference to the test data in this repo, you can run the `jitter` command line tool as follows:

```{bash}
odjitter jitter --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
  --subpoints-origins-path data/road_network.geojson \
  --subpoints-destinations-path data/road_network.geojson \
  --disaggregation-threshold 50 \
  --output-path data/output_max50.geojson
```

Try running it with a different `disaggregation-threshold` value (10 in the command below):

```{bash}
odjitter jitter --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
  --subpoints-origins-path data/road_network.geojson \
  --subpoints-destinations-path data/road_network.geojson \
  --disaggregation-threshold 10 \
  --output-path data/output_max10.geojson
```

You can run odjitter on OD datasets in which the features in the origins are different from the features in the destinations, e.g. if you have data on movement between residential areas and parks.
However, you need to first combine the geographic dataset representing origins and the geographic destinations representing destinations into a single object.
An example of this type of this is is demonstrated in the code chunk below.

```{bash}
odjitter jitter --od-csv-path data/od_destinations.csv \
  --zones-path data/zones_combined.geojson \
  --subpoints-origins-path data/road_network.geojson \
  --subpoints-destinations-path data/road_network.geojson \
  --disaggregation-threshold 50 \
  --output-path data/output_destinations_differ_50.geojson
```

# Outputs

The figure below shows the output of the `jitter` commands above visually, with the left image showing unjittered results with origins and destinations going to zone centroids (as in many if not most visualisations of desire lines between zones), the central image showing the result after setting `disaggregation-threshold` argument to 50, and the right hand figure showing the result after setting `disaggregation-threshold` to 10.

You can call the Rust code from R, as illustrated by the code below which generates the datasets shown in the figures below.

```{r, message=FALSE}
#| echo: true
remotes::install_github("dabreegster/odjitter", subdir = "r")
# Note: code to generate the visualisation below
od = readr::read_csv("data/od.csv")
zones = sf::read_sf("data/zones.geojson")
network = sf::read_sf("data/road_network.geojson")
od_sf = od::od_to_sf(od, zones)
odjittered_max_50 = odjitter::jitter(od, zones, network, disaggregation_threshold = 50)
odjittered_max_10 = odjitter::jitter(od, zones, network, disaggregation_threshold = 10)
```

```{r fig.width=8, fig.height=2, message=FALSE}
#| echo: false
#| label: thresholddemo
#| fig-cap: "Demonstration of the effect of the disaggregation threshold on the number of desire lines"
library(ggplot2)
odjittered_long = rbind(
  od_sf |> dplyr::transmute(type = "Unjittered"),
  odjittered_max_50 |> dplyr::transmute(type = "--disaggregation-threshold 50"),
  odjittered_max_10 |> dplyr::transmute(type = "--disaggregation-threshold 10")
)
# Convert type to ordered factor so that it is plotted in the correct order:
odjittered_long$type = factor(odjittered_long$type, levels = c("Unjittered", "--disaggregation-threshold 50", "--disaggregation-threshold 10"))
odjittered_long |>
  ggplot() +
  geom_sf() +
  geom_sf(data = zones, fill = NA, color = "grey") +
  geom_sf(data = network, fill = NA, color = "red") +
  facet_wrap(~type) +
  theme_void()
```

Note: `odjitter` uses a random number generator to sample points, so the output will change each time you run it, unless you set the `rng-seed`, as documented in the next section.

The `subpoints-origins-path` and `subpoints-destinations-path` can be used to generate jittered desire lines that start from or go to particular points, defined in .geojson files.
We will demonstrate this on a simple imaginary example:

```{bash}
head data/od_schools.csv
```

Set the origin, destination, and threshold keys (to car meaning that the max n. car trips per OD pair is 10 in this case) as follows:

```{bash}
odjitter jitter --od-csv-path data/od_schools.csv \
  --zones-path data/zones.geojson \
  --origin-key origin \
  --destination-key destination \
  --subpoints-origins-path data/road_network.geojson \
  --subpoints-destinations-path data/schools.geojson \
  --disaggregation-key car \
  --disaggregation-threshold 10 \
  --output-path output_max10_schools.geojson
```

You can also set weights associated with each origin and destination in the input data.
The following example weights trips to schools proportional to the values in the 'weight' key for each imaginary data point represented in the `schools.geojson` object:

```{bash}
odjitter jitter --od-csv-path data/od_schools.csv \
  --zones-path data/zones.geojson \
  --origin-key origin \
  --destination-key destination \
  --subpoints-origins-path data/road_network.geojson \
  --subpoints-destinations-path data/schools.geojson \
  --disaggregation-key car \
  --disaggregation-threshold 10 \
  --weight-key-destinations weight \
  --output-path output_max10_schools_with_weights.geojson
```

# `disaggregate` OD data

Sometimes it's useful to convert aggregate OD datasets into movement data at the trip level, with one record per trip or stage.
Microsumulation or agent-based modelling in transport simulation software such as [A/B Street](https://github.com/a-b-street/abstreet) is an example where disaggregate data may be needed.
The `disaggregate` command does this full disaggregation work, as demonstrated below.

```{bash}
odjitter disaggregate --od-csv-path data/od.csv \
  --zones-path data/zones.geojson \
  --output-path output_individual.geojson
```

```{bash}
head output_individual.geojson
rm output_individual.geojson
```


# Details

For full details on the arguments of each of `odjitter`'s subcommands can be viewed with the `--help` flag:

```{bash}
odjitter jitter --help
odjitter disaggregate --help
```

# Similar work

The technique is implemented in the function [`od_jitter()`](https://itsleeds.github.io/od/reference/od_jitter.html) from the R package [`od`](https://itsleeds.github.io/od/index.html).
The functionality contained in this repo is an extended and much faster implementation: according to our benchmarks on a large dataset it was around 1000 times faster than the R implementation.


# References

```{bash, echo=FALSE}
rm output_max*
```