diff --git a/docs/wsim-gldas-acquisition.html b/docs/wsim-gldas-acquisition.html index 4eebd0a..d28fa77 100644 --- a/docs/wsim-gldas-acquisition.html +++ b/docs/wsim-gldas-acquisition.html @@ -4645,7 +4645,7 @@ } - + @@ -4939,7 +4939,25 @@

Spatial Selection

The subsetted dataset may be written to disk, and saved for future modules.

-
stars::write_stars(wsim_gldas_anoms_tex, "wsim_gldas_tex.nc")
+
stars::write_mdim(wsim_gldas_anoms_tex, "wsim_gldas_tex.nc")
+
+
Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
+: GDAL Message 1: Dimension y lacks a indexing variable
+
+
+
Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
+: GDAL Message 1: Dimension x lacks a indexing variable
+
+
+
Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
+: GDAL Error 6: SetIndexingVariable() not implemented
+
+Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
+: GDAL Error 6: SetIndexingVariable() not implemented
+
+Warning in CPL_write_mdim(file, driver, dimx, cdl, wkt, xy, root_group_options,
+: GDAL Error 6: SetIndexingVariable() not implemented
+

The size of the pre-processed dataset is 1.6 MB compared to the original dataset of 1.7 GB. This is much more manageable in cloud environments, workshops, and git platforms.

diff --git a/docs/wsim-gldas-vis.html b/docs/wsim-gldas-vis.html index 7f94a68..e4eb1bd 100644 --- a/docs/wsim-gldas-vis.html +++ b/docs/wsim-gldas-vis.html @@ -2,14 +2,14 @@ - + - + -WSIM-GLDAS TOPS SCHOOL Lesson - WSIM-GLDAS Dataset Exploration and Visualizations +TOPS SCHOOL Water Resources Module - WSIM-GLDAS Dataset Exploration and Visualizations - - + - + @@ -4476,7 +4654,7 @@
-
+ +

TO DO

Setup

-
library(stars) # raster manipulation
-library(sf) # vector manipulation
-library(ggplot2) # advanced plotting
-library(lubridate) # date/time manipulation
-library(exactextractr) # zonal statistics

Load Data

-

Load in data from previous vignette.

-

I think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset.

-
-

Verify data structure with print or summary.

+

We’ll start again with the WSIM-GLDAS 12 month integration anomaly file and quickly subset it to the continental United States.

+
+
+Code +
wsim_gldas <- stars::read_stars("composite_anom_12mo.nc", proxy = FALSE)
+
+
+
deficit, deficit_cause, surplus, surplus_cause, both, 
+
+
+Code +
keeps<-seq(lubridate::ymd("2000-12-01"),
+           lubridate::ymd("2014-12-01"), 
+           by = "year")
+# filter using that vector
+wsim_gldas <- dplyr::filter(wsim_gldas, time %in% keeps)
+# you may want to clear your memory if your system is limited
+gc()
+
+
+
           used  (Mb) gc trigger    (Mb)   max used    (Mb)
+Ncells  1004247  53.7    1701542    90.9    1701542    90.9
+Vcells 66539286 507.7 3291939742 25115.6 3492724869 26647.4
+
+
+Code +
wsim_deficit <- wsim_gldas['deficit']
+# generate a vector of dates for subsetting
+
+usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/")
+usa <- httr::content(usa)
+usa <- sf::st_read(usa$gjDownloadURL)
+
+
+
Reading layer `geoBoundaries-USA-ADM1' from data source 
+  `https://github.com/wmgeolab/geoBoundaries/raw/9469f09/releaseData/gbOpen/USA/ADM1/geoBoundaries-USA-ADM1.geojson' 
+  using driver `GeoJSON'
+Simple feature collection with 56 features and 5 fields
+Geometry type: MULTIPOLYGON
+Dimension:     XY
+Bounding box:  xmin: -179.1489 ymin: -14.54869 xmax: 179.7785 ymax: 71.36516
+Geodetic CRS:  WGS 84
+
+
+Code +
drops<-
+  c("Alaska", "Hawaii", 
+    "American Samoa",
+    "Puerto Rico",
+    "Commonwealth of the Northern Mariana Islands", 
+    "Guam", 
+    "United States Virgin Islands")
+
+usa<-usa[!(usa$shapeName %in% drops),]
+wsim_deficit_usa<-wsim_deficit[usa]
+
+
+

Now we’ll verify this with print() and plot().

+
+
print(wsim_deficit_usa)
+
+
stars object with 3 dimensions and 1 attribute
+attribute(s):
+         Min.   1st Qu.     Median       Mean     3rd Qu.     Max.   NA's
+deficit  -100 -1.271282 -0.7135856 -0.8666337 -0.05214565 3.489938 153810
+dimension(s):
+     from  to offset delta  refsys                    values x/y
+x     221 453   -180  0.25  WGS 84                      NULL [x]
+y     163 262     90 -0.25  WGS 84                      NULL [y]
+time    1  15     NA    NA POSIXct 2000-12-01,...,2014-12-01    
+
+
+

The output shows that we’ve selected a single attribute (‘deficit’) and 15 time-steps in the ‘time’ dimension.

+
+
wsim_deficit_usa |>
+  dplyr::slice(index = 15, along = "time") |>
+  plot(reset = FALSE, breaks = c(0,-5,-10,-20,-40,-50))
+
+plot(sf::st_geometry(usa),
+     add = TRUE,
+     lwd = 3,
+     fill = NA,
+     border = 'purple')
+
+
+
+

+
+
+
+

Exploratory Histogram

Create histogram of raster values for a single time step.

-

Basic plotting method is OK, but check if it can be done with ggplotso we can use a uniform palette across all modules.

-
+

Get the values out of the raster and create a histogram.

+
+
# filter for the first time-step in the file
+usa1 <-
+  wsim_deficit_usa |> dplyr::slice(time, 1)
+
+# extract the values into a data.frame
+usa1<-as.data.frame(as.numeric(wsim_deficit_usa$deficit))
+# appropriately name the values (it was lost in the example)
+names(usa1)<-"Deficit"
+
+

Check the values

+
+
ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+
+  ggplot2::geom_histogram(na.rm = TRUE)
+
+
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+
+
+
+
+

+
+
+
+
+

There are some bad outliers, we can just zoom into the majority of values by setting x-axis limits.

+
+
ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+
+  ggplot2::geom_histogram(na.rm = TRUE)+
+  ggplot2::xlim(c(-10,0))
+
+
`stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
+
+
+
+
+

+
+
+
+

Extreme values or other items of note might require additional visualization or other data exploration.

Multi-Panel Time Series

-

Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.

+

Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.**

Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette.

-
-

Start with the basic plotting commands–create the time series with slice or other method used in previous vignette.

-
-

The palette will not exist and be difficult to use.

-

Try a built in palette for stars (not sure if possible?).

-

Introduce the official WSIM palette. This may only be helpful within a ggplot function.

-
# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset
-leg_colors<-c(
-    '#9B0039',
-    # -50 to -40
-    '#D44135',
-    # -40 to -20
-    '#FF8D43',
-    # -20 to -10
-    '#FFC754',
-    # -10 to -5
-    '#FFEDA3',
-    # -5 to -3
-    '#FFFFFF',
-    # -3 to 3
-    '#CEFFAD',
-    # 3 to 5
-    '#00F3B5',
-    # 5 to 10
-    '#00CFCE',
-    # 10 to 20
-    '#009ADE',
-    # 20 to 40
-    '#0045B5')
+
+
wsim_deficit_usa |>
+  plot(reset = FALSE,
+       col = leg_colors<-c(
+    '#9B0039',
+    # -50 to -40
+    '#D44135',
+    # -40 to -20
+    '#FF8D43',
+    # -20 to -10
+    '#FFC754',
+    # -10 to -5
+    '#FFEDA3',
+    # -5 to -3
+    '#FFFFFF'))
+
+
+
+

+
+
+
+

Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset.

Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use dplyr::slice or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette.

-
+
+
wsim_gldas_1mo <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE)
+
+
deficit, deficit_cause, surplus, surplus_cause, both, 
+
+
print(wsim_gldas_1mo)
+
+
stars object with 3 dimensions and 5 attributes
+attribute(s), summary of first 1e+05 cells:
+               Min.     1st Qu.      Median         Mean     3rd Qu.       Max.
+deficit        -100  -1.8314584  -0.2373874  -1.26453645  -0.2373874   1.896493
+deficit_cause     1 129.0000000 129.0000000 112.90956000 129.0000000 129.000000
+surplus        -100  -0.9671488  -0.7329655  -0.95631468  -0.6206152   2.384447
+surplus_cause     1 129.0000000 129.0000000 127.37130000 129.0000000 129.000000
+both              0   0.0000000   0.0000000   0.03784493   0.0000000   2.384447
+                NA's
+deficit        87340
+deficit_cause      0
+surplus        98724
+surplus_cause      0
+both           98724
+dimension(s):
+     from   to offset delta  refsys                    values x/y
+x       1 1440   -180  0.25  WGS 84                      NULL [x]
+y       1  600     90 -0.25  WGS 84                      NULL [y]
+time    1  804     NA    NA POSIXct 1948-01-01,...,2014-12-01    
+
+

Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use ggplot and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible).

-
-
-
-

Use Case Background

-

Now that we’ve keyed in on a particular event, bring in the backup information we’ve collected to discuss what actually happened.

+

Point Location Time Series

Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.

Create a vector with the point location.

-
+

Use stars::extract to extract raster values in the stack at the point location.

-
+

The resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in ggplot. Use either pivot wider/longer from dplyr or cast/melt from data.table.

-
+

Once in the proper format, plot using ggplot.

-
+

Population Exposure Plot

Use Gridded Population of the World and exactextractr to determine the number of people exposed to a given anomaly for each month of the year.

Background info on GPW would be appropriate. Same with exactextractr and zonal statistics.

Load in GPW data and the exactextractr package

-
+

Perform the time series zonal summary.

This might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website.

Resulting data.frame will probably need to be transformed to long (just like before), so it can be plotted.

-
+

Now plot the data in ggplot. I have some existing code I can pull to help with the plotting–or at least make it fancy.

@@ -4731,10 +5049,9 @@

Population Exposu // clear code selection e.clearSelection(); }); - function tippyHover(el, contentFn) { + function tippyHover(el, contentFn, onTriggerFn, onUntriggerFn) { const config = { allowHTML: true, - content: contentFn, maxWidth: 500, delay: 100, arrow: false, @@ -4744,8 +5061,17 @@

Population Exposu interactive: true, interactiveBorder: 10, theme: 'quarto', - placement: 'bottom-start' + placement: 'bottom-start', }; + if (contentFn) { + config.content = contentFn; + } + if (onTriggerFn) { + config.onTrigger = onTriggerFn; + } + if (onUntriggerFn) { + config.onUntrigger = onUntriggerFn; + } window.tippy(el, config); } const noterefs = window.document.querySelectorAll('a[role="doc-noteref"]'); @@ -4759,6 +5085,120 @@

Population Exposu const note = window.document.getElementById(id); return note.innerHTML; }); + } + const xrefs = window.document.querySelectorAll('a.quarto-xref'); + const processXRef = (id, note) => { + // Strip column container classes + const stripColumnClz = (el) => { + el.classList.remove("page-full", "page-columns"); + if (el.children) { + for (const child of el.children) { + stripColumnClz(child); + } + } + } + stripColumnClz(note) + if (id === null || id.startsWith('sec-')) { + // Special case sections, only their first couple elements + const container = document.createElement("div"); + if (note.children && note.children.length > 2) { + container.appendChild(note.children[0].cloneNode(true)); + for (let i = 1; i < note.children.length; i++) { + const child = note.children[i]; + if (child.tagName === "P" && child.innerText === "") { + continue; + } else { + container.appendChild(child.cloneNode(true)); + break; + } + } + if (window.Quarto?.typesetMath) { + window.Quarto.typesetMath(container); + } + return container.innerHTML + } else { + if (window.Quarto?.typesetMath) { + window.Quarto.typesetMath(note); + } + return note.innerHTML; + } + } else { + // Remove any anchor links if they are present + const anchorLink = note.querySelector('a.anchorjs-link'); + if (anchorLink) { + anchorLink.remove(); + } + if (window.Quarto?.typesetMath) { + window.Quarto.typesetMath(note); + } + return note.innerHTML; + } + } + for (var i=0; i res.text()) + .then(html => { + const parser = new DOMParser(); + const htmlDoc = parser.parseFromString(html, "text/html"); + const note = htmlDoc.getElementById(id); + if (note !== null) { + const html = processXRef(id, note); + instance.setContent(html); + } + }).finally(() => { + instance.enable(); + instance.show(); + }); + } + } else { + // See if we can fetch a full url (with no hash to target) + // This is a special case and we should probably do some content thinning / targeting + fetch(url) + .then(res => res.text()) + .then(html => { + const parser = new DOMParser(); + const htmlDoc = parser.parseFromString(html, "text/html"); + const note = htmlDoc.querySelector('main.content'); + if (note !== null) { + // This should only happen for chapter cross references + // (since there is no id in the URL) + // remove the first header + if (note.children.length > 0 && note.children[0].tagName === "HEADER") { + note.children[0].remove(); + } + const html = processXRef(null, note); + instance.setContent(html); + } + }).finally(() => { + instance.enable(); + instance.show(); + }); + } + }, function(instance) { + }); } let selectedAnnoteEl; const selectorForAnnotation = ( cell, annotation) => { @@ -4801,6 +5241,7 @@

Population Exposu } div.style.top = top - 2 + "px"; div.style.height = height + 4 + "px"; + div.style.left = 0; let gutterDiv = window.document.getElementById("code-annotation-line-highlight-gutter"); if (gutterDiv === null) { gutterDiv = window.document.createElement("div"); @@ -4826,6 +5267,32 @@

Population Exposu }); selectedAnnoteEl = undefined; }; + // Handle positioning of the toggle + window.addEventListener( + "resize", + throttle(() => { + elRect = undefined; + if (selectedAnnoteEl) { + selectCodeLines(selectedAnnoteEl); + } + }, 10) + ); + function throttle(fn, ms) { + let throttle = false; + let timer; + return (...args) => { + if(!throttle) { // first call gets through + fn.apply(this, args); + throttle = true; + } else { // all the others get throttled + if(timer) clearTimeout(timer); // cancel #2 + timer = setTimeout(() => { + fn.apply(this, args); + timer = throttle = false; + }, ms); + } + }; + } // Attach click handler to the DT const annoteDls = window.document.querySelectorAll('dt[data-target-cell]'); for (const annoteDlNode of annoteDls) { @@ -4889,4 +5356,5 @@

Population Exposu + \ No newline at end of file diff --git a/wsim-gldas-vis.qmd b/wsim-gldas-vis.qmd index 83a5c0d..3046bb6 100644 --- a/wsim-gldas-vis.qmd +++ b/wsim-gldas-vis.qmd @@ -1,6 +1,6 @@ --- title: "WSIM-GLDAS Dataset Exploration and Visualizations" -author: "Joshua Brinks" +author: "Joshua Brinks & Elaine Fatumi" date: "December, 6 2023" --- @@ -24,65 +24,103 @@ date: "December, 6 2023" ## Setup -```r -library(stars) # raster manipulation -library(sf) # vector manipulation -library(ggplot2) # advanced plotting -library(lubridate) # date/time manipulation -library(exactextractr) # zonal statistics - -``` ## Load Data -Load in data from previous vignette. - -I think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset. +We'll start again with the WSIM-GLDAS 12 month integration anomaly file and quickly subset it to the continental United States. + +```{r warning=FALSE} +#| code-fold: true +wsim_gldas <- stars::read_stars("composite_anom_12mo.nc", proxy = FALSE) +keeps<-seq(lubridate::ymd("2000-12-01"), + lubridate::ymd("2014-12-01"), + by = "year") +# filter using that vector +wsim_gldas <- dplyr::filter(wsim_gldas, time %in% keeps) +# you may want to clear your memory if your system is limited +gc() +wsim_deficit <- wsim_gldas['deficit'] +# generate a vector of dates for subsetting + +usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/") +usa <- httr::content(usa) +usa <- sf::st_read(usa$gjDownloadURL) +drops<- + c("Alaska", "Hawaii", + "American Samoa", + "Puerto Rico", + "Commonwealth of the Northern Mariana Islands", + "Guam", + "United States Virgin Islands") + +usa<-usa[!(usa$shapeName %in% drops),] +wsim_deficit_usa<-wsim_deficit[usa] +``` -```r +Now we'll verify this with `print()` and `plot()`. +```{r} +print(wsim_deficit_usa) +``` +The output shows that we've selected a single attribute ('deficit') and 15 time-steps in the 'time' dimension. + +```{r warning=FALSE, message=FALSE} +wsim_deficit_usa |> + dplyr::slice(index = 15, along = "time") |> + plot(reset = FALSE, breaks = c(0,-5,-10,-20,-40,-50)) + +plot(sf::st_geometry(usa), + add = TRUE, + lwd = 3, + fill = NA, + border = 'purple') ``` -Verify data structure with `print` or `summary.` ## Exploratory Histogram Create histogram of raster values for a single time step. -Basic plotting method is OK, but check if it can be done with `ggplot`so we can use a uniform palette across all modules. +Get the values out of the raster and create a histogram. -```r +```{r} +# filter for the first time-step in the file +usa1 <- + wsim_deficit_usa |> dplyr::slice(time, 1) +# extract the values into a data.frame +usa1<-as.data.frame(as.numeric(wsim_deficit_usa$deficit)) +# appropriately name the values (it was lost in the example) +names(usa1)<-"Deficit" ``` -Extreme values or other items of note might require additional visualization or other data exploration. - -## Multi-Panel Time Series -Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year. - -Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette. - -```r +Check the values +```{r} +ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+ + ggplot2::geom_histogram(na.rm = TRUE) ``` +There are some bad outliers, we can just zoom into the majority of values by setting x-axis limits. - -Start with the basic plotting commands--create the time series with `slice` or other method used in previous vignette. - -```r - +```{r} +ggplot2::ggplot(usa1, ggplot2::aes(Deficit))+ + ggplot2::geom_histogram(na.rm = TRUE)+ + ggplot2::xlim(c(-10,0)) ``` -The palette will not exist and be difficult to use. +Extreme values or other items of note might require additional visualization or other data exploration. -Try a built in palette for stars (not sure if possible?). +## Multi-Panel Time Series -Introduce the official WSIM palette. This may only be helpful within a `ggplot` function. +Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.** -```r -# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset -leg_colors<-c( +Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette. + +```{r warning = FALSE} +wsim_deficit_usa |> + plot(reset = FALSE, + col = leg_colors<-c( '#9B0039', # -50 to -40 '#D44135', @@ -93,25 +131,17 @@ leg_colors<-c( # -10 to -5 '#FFEDA3', # -5 to -3 - '#FFFFFF', - # -3 to 3 - '#CEFFAD', - # 3 to 5 - '#00F3B5', - # 5 to 10 - '#00CFCE', - # 10 to 20 - '#009ADE', - # 20 to 40 - '#0045B5') + '#FFFFFF')) ``` Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset. Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use `dplyr::slice` or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette. -```r +```{r} +wsim_gldas_1mo <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE) +print(wsim_gldas_1mo) ``` Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use `ggplot` and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible). @@ -120,10 +150,6 @@ Create a multi-panel figure with each panel representing 1 month to identify the ``` -## Use Case Background - -Now that we've keyed in on a particular event, bring in the backup information we've collected to discuss what actually happened. - ## Point Location Time Series Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.