Skip to content

Commit

Permalink
setup
Browse files Browse the repository at this point in the history
  • Loading branch information
jbrinks committed Dec 11, 2023
1 parent d7d529b commit 81d674a
Show file tree
Hide file tree
Showing 10 changed files with 10,366 additions and 0 deletions.
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
/.quarto/
wsim-gldas-vis_files
water-wsim-gldas-v1-composite-anom-one-month-netcdf
water-wsim-gldas-v1-composite-one-month-netcdf
Empty file added .nojekyll
Empty file.
16 changes: 16 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
project:
title: "TOPS SCHOOL Water Resources Module"
type: website
output-dir: docs
format:
html:
embed-resources: true
theme:
- sandstone
css: "https://github.com/ciesin-geospatial/TOPSTSCHOOL/blob/main/custom.scss"
toc: true
toc-title: Contents
toc-location: left



Empty file added docs/.nojekyll
Empty file.
8 changes: 8 additions & 0 deletions docs/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>Redirect to wsim-gldas-acquisition.html</title>
<meta http-equiv="refresh" content="0;URL='wsim-gldas-acquisition.html'" />
</head>
<body>
</body>
</html>
121 changes: 121 additions & 0 deletions docs/search.json

Large diffs are not rendered by default.

4,987 changes: 4,987 additions & 0 deletions docs/wsim-gldas-acquisition.html

Large diffs are not rendered by default.

4,892 changes: 4,892 additions & 0 deletions docs/wsim-gldas-vis.html

Large diffs are not rendered by default.

161 changes: 161 additions & 0 deletions wsim-gldas-acquisition.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
---
title: "Acquiring and Pre-Processing the WSIM-GLDAS Dataset"
author: "Josh Brinks"
date: "December, 6 2023"
---

## TO DO

* Add context for package selection
+ stars for raster management
+ sf for vector/shapefile/geojson
+ lubridate for date management
* More dataset context/explanation (geoBoundaries vs gadm).
* Citations and external links for data and packages.
* Decide on which wsim-gldas version to use.
+ Switch out the current for a 12-month integration anomaly.
* Write out the smaller pre-processed file to disk for potential use in binder workshop or subsequent lesson in the module.
* Introduce automated data acquisition with some sedac or earthdata package??

## Introduction

WSIM-GLDAS can be download from [SEDAC](https://sedac.ciesin.columbia.edu/data/set/water-wsim-gldas-v1). Downloads are organized by combination of variable (composite surplus/deficit, temperature, PETmE, runoff, soil moisture, precipitation) and integration period (1, 3, 6, 12 months). Each variable-integration combination consists of a NetCDF raster file with a time dimension that contains a raster layer for each of the 804 months between January, 1948 and December, 2014. Some variables also contain multiple attributes each with their own time dimension of 804 rasters. Hence, this is a large file that takes upwards of 2 hours to download and may cause memory issues on certain systems. We will work with the composite anomolies integrated over 1 month periods.

## Reading the Data

Once you've completed the download and placed the .nc into your working directory read in the file with the `stars::read_stars()` function.

```{r}
# proxy = TRUE will limit memory useage but does
# not always work with certain downstream processing functions
wsim_gldas_anoms <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE)
print(wsim_gldas_anoms)
```

The print command gives some basic information. The outputs tells us we have 5 attributes (deficit, deficit_cause, surplus, surplus_cause, both) and 3 dimensions. The first 2 dimensions are the spatial extents (x/y--longitude/latitude) and time is the 3rd dimension.

This means the total number of individual raster layers in this NetCDF is 4020 (5 attributes x 804 time steps/months).

## Attribute Selection

We can start paring this down by subsetting for just the combined surplus/deficit anomaly (both).

```{r}
names(wsim_gldas_anoms)
wsim_gldas_anoms <- wsim_gldas_anoms['both']
```
## Time Selection

Specifying a temporal range of interest will free up more space. We'll grab every month for 2000-2014. This can be accomplished by generating a sequence for every month between January 2000 and December 2014, and then passing that vector of dates to `filter`.


```{r}
# generate a vector of dates for subsetting
keeps<-seq(lubridate::ymd("2000-01-01"),
lubridate::ymd("2014-12-01"),
by = "month")
# filter using that vector
wsim_gldas_anoms <- dplyr::filter(wsim_gldas_anoms, time %in% keeps)
print(wsim_gldas_anoms)
```

Now we're down to a single attribute ("both") with 180 time-steps. We can take a look at the first 6 months by passing the object through `slice` and then into `plot`.

```{r}
wsim_gldas_anoms |>
dplyr::slice(index = 1:6, along = "time") |>
plot(key.pos = 1)
```

Although we've pared it down to a single attribute with a restricted time period of interest, we can take it a step further and reduce the spatial extent to a country or state of interest.

## Spatial Selection

We can spatially crop the raster stack with a few different methods. Options include using a bounding box (xmin, ymin, xmax, ymax), another raster object, or a vectorized boundary like a shapefile or geojson.

Using a vector boundary is one of the more common geoprocessing tasks. In this example we'll pull a geojson of the United States from the geoBoundaries API. You can also download vectorized boundaries directly from [](www.geoboundaries.org).

The call to geoBoundaries' API is pretty simple:

"https://www.geoboundaries.org/api/current/gbOpen/*ISO3C*/*LEVEL*/"

We adjust the bolded components of the URL address to specify the country we want using the ISO 3 Character Country Code (USA) and the desired Administrative Level (ADM1).

```{r}
usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/")
```

After the `GET` call, we have to translate the `content`.

```{r}
usa <- httr::content(usa)
names(usa)
```

The parsed content object contains 32 components. Item 29 is a direct link to the geojson file (gjDownloadURL). Read that in and check the visuals.

```{r}
usa <- sf::st_read(usa$gjDownloadURL)
plot(sf::st_geometry(usa))
```
This looks good, but it includes all United States territories. For simplicity, we can get it down to only the contiguous United States.

```{r}
drops<-
c("Alaska", "Hawaii",
"American Samoa",
"Puerto Rico",
"Commonwealth of the Northern Mariana Islands",
"Guam",
"United States Virgin Islands")
usa<-usa[!(usa$shapeName %in% drops),]
plot(sf::st_geometry(usa))
```

We can take this a step further and select a target state.


```{r}
texas <- usa[usa$shapeName == "Texas",]
plot(sf::st_geometry(texas))
```

From here we can crop the WSIM GLDAS raster stack by indexing it with the stored boundary of Texas


```{r}
wsim_gldas_anoms_tex <- wsim_gldas_anoms[texas]
```

For a final visual check we'll take the last time-step in the WSIM-GLDAS dataset (180/December, 2014) and plot it with an overlay of the Texas boundary.


```{r}
wsim_gldas_anoms_tex |>
dplyr::slice(index = 180, along = "time") |>
plot(reset = FALSE)
plot(sf::st_geometry(texas),
add = TRUE,
lwd = 3,
fill = NA,
border = 'purple')
```

The subsetted dataset may be written to disk, and saved for future modules.

```{r}
stars::write_stars(wsim_gldas_anoms_tex, "wsim_gldas_tex.nc")
```

The size of the pre-processed dataset is 1.6 MB compared to the original dataset of 1.7 GB. This is much more manageable in cloud environments, workshops, and git platforms.
177 changes: 177 additions & 0 deletions wsim-gldas-vis.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
---
title: "WSIM-GLDAS Dataset Exploration and Visualizations"
author: "Joshua Brinks"
date: "December, 6 2023"
---

## TO DO

* Write the actual code and narrative.
* Determine the region and time period of focus to draw in our use cases/human focused stories.
* Determine the method of exploration.
+ Mimic our process?
* 12 month integration panels of the CON USA from 2000-2014 to identify areas of interest.
* Zoom in to locations of interest and switch to 1-month integration for the years identified in the previous step.

## Introduction

* Raster/vector visualization background?
+ General
+ Water resource specific
* Package background
+ Basic plotting with stars/sf
+ more advanced plotting with ggplot/ggmap

## Setup

```r
library(stars) # raster manipulation
library(sf) # vector manipulation
library(ggplot2) # advanced plotting
library(lubridate) # date/time manipulation
library(exactextractr) # zonal statistics

```

## Load Data

Load in data from previous vignette.

I think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset.

```r

```

Verify data structure with `print` or `summary.`

## Exploratory Histogram

Create histogram of raster values for a single time step.

Basic plotting method is OK, but check if it can be done with `ggplot`so we can use a uniform palette across all modules.

```r

```

Extreme values or other items of note might require additional visualization or other data exploration.

## Multi-Panel Time Series

Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.

Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette.

```r

```


Start with the basic plotting commands--create the time series with `slice` or other method used in previous vignette.

```r

```

The palette will not exist and be difficult to use.

Try a built in palette for stars (not sure if possible?).

Introduce the official WSIM palette. This may only be helpful within a `ggplot` function.

```r
# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset
leg_colors<-c(
'#9B0039',
# -50 to -40
'#D44135',
# -40 to -20
'#FF8D43',
# -20 to -10
'#FFC754',
# -10 to -5
'#FFEDA3',
# -5 to -3
'#FFFFFF',
# -3 to 3
'#CEFFAD',
# 3 to 5
'#00F3B5',
# 5 to 10
'#00CFCE',
# 10 to 20
'#009ADE',
# 20 to 40
'#0045B5')
```

Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset.

Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use `dplyr::slice` or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette.

```r

```

Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use `ggplot` and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible).

```r

```

## Use Case Background

Now that we've keyed in on a particular event, bring in the backup information we've collected to discuss what actually happened.

## Point Location Time Series

Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.

Create a vector with the point location.

```r

```

Use `stars::extract` to extract raster values in the stack at the point location.

```r

```

The resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in `ggplot`. Use either pivot wider/longer from `dplyr` or cast/melt from `data.table`.

```r

```

Once in the proper format, plot using `ggplot`.

```r

```

## Population Exposure Plot

Use Gridded Population of the World and `exactextractr` to determine the number of people exposed to a given anomaly for each month of the year.

Background info on GPW would be appropriate. Same with `exactextractr` and zonal statistics.

Load in GPW data and the `exactextractr` package

```r

```

Perform the time series zonal summary.

This might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website.

Resulting data.frame will probably need to be transformed to long (just like before), so it can be plotted.

```r

```

Now plot the data in ggplot. I have some existing code I can pull to help with the plotting--or at least make it fancy.

0 comments on commit 81d674a

Please sign in to comment.