wsim-gldas-vis.qmd

---
title: "WSIM-GLDAS Dataset Exploration and Visualizations"
author: "Joshua Brinks & Elaine Fatumi"
date: "December, 6 2023"
---

## TO DO

  * Write the actual code and narrative.
  * Determine the region and time period of focus to draw in our use cases/human focused stories.
  * Determine the method of exploration.
    + Mimic our process?
      * 12 month integration panels of the CON USA from 2000-2014 to identify areas of interest.
      * Zoom in to locations of interest and switch to 1-month integration for the years identified in the previous step.

## Introduction

  * Raster/vector visualization background?
    + General
    + Water resource specific
  * Package background
    + Basic plotting with stars/sf
    + more advanced plotting with ggplot/ggmap

## Setup


## Load Data

We'll start again with the WSIM-GLDAS 12 month integration anomaly file and quickly subset it to the continental United States.

```{r warning=FALSE}
#| code-fold: true
wsim_gldas <- stars::read_stars("composite_anom_12mo.nc", proxy = FALSE)
keeps<-seq(lubridate::ymd("2000-12-01"),
           lubridate::ymd("2014-12-01"), 
           by = "year")
# filter using that vector
wsim_gldas <- dplyr::filter(wsim_gldas, time %in% keeps)
# you may want to clear your memory if your system is limited
gc()
wsim_deficit <- wsim_gldas['deficit']
# generate a vector of dates for subsetting

usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/")
usa <- httr::content(usa)
usa <- sf::st_read(usa$gjDownloadURL)
drops<-
  c("Alaska", "Hawaii", 
    "American Samoa",
    "Puerto Rico",
    "Commonwealth of the Northern Mariana Islands", 
    "Guam", 
    "United States Virgin Islands")

usa<-usa[!(usa$shapeName %in% drops),]
wsim_deficit_usa<-wsim_deficit[usa]
```

Now we'll verify this with `print()` and `plot()`.

```{r}
print(wsim_deficit_usa)
```
The output shows that we've selected a single attribute ('deficit') and 15 time-steps in the 'time' dimension.

```{r warning=FALSE, message=FALSE}
wsim_deficit_usa |>
  dplyr::slice(index = 15, along = "time") |>
  plot(reset = FALSE, breaks = c(0,-5,-10,-20,-40,-50))

plot(sf::st_geometry(usa),
     add = TRUE,
     lwd = 3,
     fill = NA,
     border = 'purple')
```


## Exploratory Histogram

Create histogram of raster values for a single time step.

Get the values out of the raster and create a histogram.

```{r}
# filter for the first time-step in the file
usa1 <-
  wsim_deficit_usa |> dplyr::slice(time, 1)

# extract the values into a data.frame
usa1<-as.data.frame(as.numeric(wsim_deficit_usa$deficit))
# appropriately name the values (it was lost in the example)
names(usa1)<-"Deficit"
```


Check the values

```{r}
ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+
  ggplot2::geom_histogram(na.rm = TRUE)
```
There are some bad outliers, we can just zoom into the majority of values by setting x-axis limits.

```{r}
ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+
  ggplot2::geom_histogram(na.rm = TRUE)+
  ggplot2::xlim(c(-10,0))
```

Extreme values or other items of note might require additional visualization or other data exploration.

## Multi-Panel Time Series

Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.**

Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette.

```{r warning = FALSE}
wsim_deficit_usa |>
  plot(reset = FALSE,
       col = leg_colors<-c(
    '#9B0039',
    # -50 to -40
    '#D44135',
    # -40 to -20
    '#FF8D43',
    # -20 to -10
    '#FFC754',
    # -10 to -5
    '#FFEDA3',
    # -5 to -3
    '#FFFFFF'))
```

Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset.

Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use `dplyr::slice` or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette.

```{r}
wsim_gldas_1mo <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE)

print(wsim_gldas_1mo)
```

Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use `ggplot` and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible).

```r

```

## Point Location Time Series

Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.

Create a vector with the point location.

```r

```

Use `stars::extract` to extract raster values in the stack at the point location. 

```r

```

The resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in `ggplot`. Use either pivot wider/longer from `dplyr` or cast/melt from `data.table`.

```r

```

Once in the proper format, plot using `ggplot`.

```r

```

## Population Exposure Plot

Use Gridded Population of the World and `exactextractr` to determine the number of people exposed to a given anomaly for each month of the year. 

Background info on GPW would be appropriate. Same with `exactextractr` and zonal statistics.

Load in GPW data and the `exactextractr` package

```r

```

Perform the time series zonal summary.

This might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website.

Resulting data.frame will probably need to be transformed to long (just like before), so it can be plotted.

```r

```

Now plot the data in ggplot. I have some existing code I can pull to help with the plotting--or at least make it fancy.