From 81d674a03728005fbffc2b5762439cd400ef85a0 Mon Sep 17 00:00:00 2001 From: Josh Brinks Date: Mon, 11 Dec 2023 11:17:38 -0500 Subject: [PATCH] setup --- .gitignore | 4 + .nojekyll | 0 _quarto.yml | 16 + docs/.nojekyll | 0 docs/index.html | 8 + docs/search.json | 121 + docs/wsim-gldas-acquisition.html | 4987 ++++++++++++++++++++++++++++++ docs/wsim-gldas-vis.html | 4892 +++++++++++++++++++++++++++++ wsim-gldas-acquisition.qmd | 161 + wsim-gldas-vis.qmd | 177 ++ 10 files changed, 10366 insertions(+) create mode 100644 .gitignore create mode 100644 .nojekyll create mode 100644 _quarto.yml create mode 100644 docs/.nojekyll create mode 100644 docs/index.html create mode 100644 docs/search.json create mode 100644 docs/wsim-gldas-acquisition.html create mode 100644 docs/wsim-gldas-vis.html create mode 100644 wsim-gldas-acquisition.qmd create mode 100644 wsim-gldas-vis.qmd diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..dcf1b7b --- /dev/null +++ b/.gitignore @@ -0,0 +1,4 @@ +/.quarto/ +wsim-gldas-vis_files +water-wsim-gldas-v1-composite-anom-one-month-netcdf +water-wsim-gldas-v1-composite-one-month-netcdf diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/_quarto.yml b/_quarto.yml new file mode 100644 index 0000000..d3fe203 --- /dev/null +++ b/_quarto.yml @@ -0,0 +1,16 @@ +project: + title: "TOPS SCHOOL Water Resources Module" + type: website + output-dir: docs +format: + html: + embed-resources: true + theme: + - sandstone + css: "https://github.com/ciesin-geospatial/TOPSTSCHOOL/blob/main/custom.scss" + toc: true + toc-title: Contents + toc-location: left + + + diff --git a/docs/.nojekyll b/docs/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..399b759 --- /dev/null +++ b/docs/index.html @@ -0,0 +1,8 @@ + + + Redirect to wsim-gldas-acquisition.html + + + + + diff --git a/docs/search.json b/docs/search.json new file mode 100644 index 0000000..fb3f3c9 --- /dev/null +++ b/docs/search.json @@ -0,0 +1,121 @@ +[ + { + "objectID": "wsim-gldas-acquisition.html", + "href": "wsim-gldas-acquisition.html", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "", + "text": "Add context for package selection\n\nstars for raster management\nsf for vector/shapefile/geojson\nlubridate for date management\n\nMore dataset context/explanation (geoBoundaries vs gadm).\nCitations and external links for data and packages.\nDecide on which wsim-gldas version to use.\n\nSwitch out the current for a 12-month integration anomaly.\n\nWrite out the smaller pre-processed file to disk for potential use in binder workshop or subsequent lesson in the module.\nIntroduce automated data acquisition with some sedac or earthdata package??" + }, + { + "objectID": "wsim-gldas-acquisition.html#to-do", + "href": "wsim-gldas-acquisition.html#to-do", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "", + "text": "Add context for package selection\n\nstars for raster management\nsf for vector/shapefile/geojson\nlubridate for date management\n\nMore dataset context/explanation (geoBoundaries vs gadm).\nCitations and external links for data and packages.\nDecide on which wsim-gldas version to use.\n\nSwitch out the current for a 12-month integration anomaly.\n\nWrite out the smaller pre-processed file to disk for potential use in binder workshop or subsequent lesson in the module.\nIntroduce automated data acquisition with some sedac or earthdata package??" + }, + { + "objectID": "wsim-gldas-acquisition.html#introduction", + "href": "wsim-gldas-acquisition.html#introduction", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "Introduction", + "text": "Introduction\nWSIM-GLDAS can be download from SEDAC. Downloads are organized by combination of variable (composite surplus/deficit, temperature, PETmE, runoff, soil moisture, precipitation) and integration period (1, 3, 6, 12 months). Each variable-integration combination consists of a NetCDF raster file with a time dimension that contains a raster layer for each of the 804 months between January, 1948 and December, 2014. Some variables also contain multiple attributes each with their own time dimension of 804 rasters. Hence, this is a large file that takes upwards of 2 hours to download and may cause memory issues on certain systems. We will work with the composite anomolies integrated over 1 month periods." + }, + { + "objectID": "wsim-gldas-acquisition.html#reading-the-data", + "href": "wsim-gldas-acquisition.html#reading-the-data", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "Reading the Data", + "text": "Reading the Data\nOnce you’ve completed the download and placed the .nc into your working directory read in the file with the stars::read_stars() function.\n\n# proxy = TRUE will limit memory useage but does \n# not always work with certain downstream processing functions\n\nwsim_gldas_anoms <- stars::read_stars(\"composite_anom_1mo.nc\", proxy = FALSE)\n\ndeficit, deficit_cause, surplus, surplus_cause, both, \n\nprint(wsim_gldas_anoms)\n\nstars object with 3 dimensions and 5 attributes\nattribute(s), summary of first 1e+05 cells:\n Min. 1st Qu. Median Mean 3rd Qu. Max.\ndeficit -100 -1.8314584 -0.2373874 -1.26453645 -0.2373874 1.896493\ndeficit_cause 1 129.0000000 129.0000000 112.90956000 129.0000000 129.000000\nsurplus -100 -0.9671488 -0.7329655 -0.95631468 -0.6206152 2.384447\nsurplus_cause 1 129.0000000 129.0000000 127.37130000 129.0000000 129.000000\nboth 0 0.0000000 0.0000000 0.03784493 0.0000000 2.384447\n NA's\ndeficit 87340\ndeficit_cause 0\nsurplus 98724\nsurplus_cause 0\nboth 98724\ndimension(s):\n from to offset delta refsys values x/y\nx 1 1440 -180 0.25 WGS 84 NULL [x]\ny 1 600 90 -0.25 WGS 84 NULL [y]\ntime 1 804 NA NA POSIXct 1948-01-01,...,2014-12-01 \n\n\nThe print command gives some basic information. The outputs tells us we have 5 attributes (deficit, deficit_cause, surplus, surplus_cause, both) and 3 dimensions. The first 2 dimensions are the spatial extents (x/y–longitude/latitude) and time is the 3rd dimension.\nThis means the total number of individual raster layers in this NetCDF is 4020 (5 attributes x 804 time steps/months)." + }, + { + "objectID": "wsim-gldas-acquisition.html#attribute-selection", + "href": "wsim-gldas-acquisition.html#attribute-selection", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "Attribute Selection", + "text": "Attribute Selection\nWe can start paring this down by subsetting for just the combined surplus/deficit anomaly (both).\n\nnames(wsim_gldas_anoms)\n\n[1] \"deficit\" \"deficit_cause\" \"surplus\" \"surplus_cause\"\n[5] \"both\" \n\nwsim_gldas_anoms <- wsim_gldas_anoms['both']" + }, + { + "objectID": "wsim-gldas-acquisition.html#time-selection", + "href": "wsim-gldas-acquisition.html#time-selection", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "Time Selection", + "text": "Time Selection\nSpecifying a temporal range of interest will free up more space. We’ll grab every month for 2000-2014. This can be accomplished by generating a sequence for every month between January 2000 and December 2014, and then passing that vector of dates to filter.\n\n# generate a vector of dates for subsetting\nkeeps<-seq(lubridate::ymd(\"2000-01-01\"),\n lubridate::ymd(\"2014-12-01\"), \n by = \"month\")\n# filter using that vector\nwsim_gldas_anoms <- dplyr::filter(wsim_gldas_anoms, time %in% keeps)\n\nprint(wsim_gldas_anoms)\n\nstars object with 3 dimensions and 1 attribute\nattribute(s), summary of first 1e+05 cells:\n Min. 1st Qu. Median Mean 3rd Qu. Max. NA's\nboth 0 0 0 5.415477 1.631542 100 98724\ndimension(s):\n from to offset delta refsys values x/y\nx 1 1440 -180 0.25 WGS 84 NULL [x]\ny 1 600 90 -0.25 WGS 84 NULL [y]\ntime 1 180 NA NA POSIXct 2000-01-01,...,2014-12-01 \n\n\nNow we’re down to a single attribute (“both”) with 180 time-steps. We can take a look at the first 6 months by passing the object through slice and then into plot.\n\nwsim_gldas_anoms |>\n dplyr::slice(index = 1:6, along = \"time\") |>\n plot(key.pos = 1)\n\ndownsample set to 1\n\n\n\n\n\nAlthough we’ve pared it down to a single attribute with a restricted time period of interest, we can take it a step further and reduce the spatial extent to a country or state of interest." + }, + { + "objectID": "wsim-gldas-acquisition.html#spatial-selection", + "href": "wsim-gldas-acquisition.html#spatial-selection", + "title": "Acquiring and Pre-Processing the WSIM-GLDAS Dataset", + "section": "Spatial Selection", + "text": "Spatial Selection\nWe can spatially crop the raster stack with a few different methods. Options include using a bounding box (xmin, ymin, xmax, ymax), another raster object, or a vectorized boundary like a shapefile or geojson.\nUsing a vector boundary is one of the more common geoprocessing tasks. In this example we’ll pull a geojson of the United States from the geoBoundaries API. You can also download vectorized boundaries directly from .\nThe call to geoBoundaries’ API is pretty simple:\n“https://www.geoboundaries.org/api/current/gbOpen/ISO3C/LEVEL/”\nWe adjust the bolded components of the URL address to specify the country we want using the ISO 3 Character Country Code (USA) and the desired Administrative Level (ADM1).\n\nusa <- httr::GET(\"https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/\")\n\nAfter the GET call, we have to translate the content.\n\nusa <- httr::content(usa)\n\nnames(usa)\n\n [1] \"boundaryID\" \"boundaryName\" \n [3] \"boundaryISO\" \"boundaryYearRepresented\" \n [5] \"boundaryType\" \"boundaryCanonical\" \n [7] \"boundarySource\" \"boundaryLicense\" \n [9] \"licenseDetail\" \"licenseSource\" \n[11] \"boundarySourceURL\" \"sourceDataUpdateDate\" \n[13] \"buildDate\" \"Continent\" \n[15] \"UNSDG-region\" \"UNSDG-subregion\" \n[17] \"worldBankIncomeGroup\" \"admUnitCount\" \n[19] \"meanVertices\" \"minVertices\" \n[21] \"maxVertices\" \"meanPerimeterLengthKM\" \n[23] \"minPerimeterLengthKM\" \"maxPerimeterLengthKM\" \n[25] \"meanAreaSqKM\" \"minAreaSqKM\" \n[27] \"maxAreaSqKM\" \"staticDownloadLink\" \n[29] \"gjDownloadURL\" \"tjDownloadURL\" \n[31] \"imagePreview\" \"simplifiedGeometryGeoJSON\"\n\n\nThe parsed content object contains 32 components. Item 29 is a direct link to the geojson file (gjDownloadURL). Read that in and check the visuals.\n\nusa <- sf::st_read(usa$gjDownloadURL)\n\nReading layer `geoBoundaries-USA-ADM1' from data source \n `https://github.com/wmgeolab/geoBoundaries/raw/c0ed7b8/releaseData/gbOpen/USA/ADM1/geoBoundaries-USA-ADM1.geojson' \n using driver `GeoJSON'\nSimple feature collection with 56 features and 5 fields\nGeometry type: MULTIPOLYGON\nDimension: XY\nBounding box: xmin: -179.1489 ymin: -14.54869 xmax: 179.7785 ymax: 71.36516\nGeodetic CRS: WGS 84\n\nplot(sf::st_geometry(usa))\n\n\n\n\nThis looks good, but it includes all United States territories. For simplicity, we can get it down to only the contiguous United States.\n\ndrops<-\n c(\"Alaska\", \"Hawaii\", \n \"American Samoa\",\n \"Puerto Rico\",\n \"Commonwealth of the Northern Mariana Islands\", \n \"Guam\", \n \"United States Virgin Islands\")\n\nusa<-usa[!(usa$shapeName %in% drops),]\n\nplot(sf::st_geometry(usa))\n\n\n\n\nWe can take this a step further and select a target state.\n\ntexas <- usa[usa$shapeName == \"Texas\",]\n\nplot(sf::st_geometry(texas))\n\n\n\n\nFrom here we can crop the WSIM GLDAS raster stack by indexing it with the stored boundary of Texas\n\nwsim_gldas_anoms_tex <- wsim_gldas_anoms[texas]\n\nWarning in st_crop.stars(x, i, crop = crop, ...): st_crop: bounding boxes of x\nand y do not overlap\n\n\nFor a final visual check we’ll take the last time-step in the WSIM-GLDAS dataset (180/December, 2014) and plot it with an overlay of the Texas boundary.\n\nwsim_gldas_anoms_tex |>\n dplyr::slice(index = 180, along = \"time\") |>\n plot(reset = FALSE)\n\nplot(sf::st_geometry(texas),\n add = TRUE,\n lwd = 3,\n fill = NA,\n border = 'purple')\n\n\n\n\nThe subsetted dataset may be written to disk, and saved for future modules.\n\nstars::write_stars(wsim_gldas_anoms_tex, \"wsim_gldas_tex.nc\")\n\nThe size of the pre-processed dataset is 1.6 MB compared to the original dataset of 1.7 GB. This is much more manageable in cloud environments, workshops, and git platforms." + }, + { + "objectID": "wsim-gldas-vis.html", + "href": "wsim-gldas-vis.html", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "", + "text": "Write the actual code and narrative.\nDetermine the region and time period of focus to draw in our use cases/human focused stories.\nDetermine the method of exploration.\n\nMimic our process?\n\n12 month integration panels of the CON USA from 2000-2014 to identify areas of interest.\nZoom in to locations of interest and switch to 1-month integration for the years identified in the previous step." + }, + { + "objectID": "wsim-gldas-vis.html#to-do", + "href": "wsim-gldas-vis.html#to-do", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "", + "text": "Write the actual code and narrative.\nDetermine the region and time period of focus to draw in our use cases/human focused stories.\nDetermine the method of exploration.\n\nMimic our process?\n\n12 month integration panels of the CON USA from 2000-2014 to identify areas of interest.\nZoom in to locations of interest and switch to 1-month integration for the years identified in the previous step." + }, + { + "objectID": "wsim-gldas-vis.html#introduction", + "href": "wsim-gldas-vis.html#introduction", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Introduction", + "text": "Introduction\n\nRaster/vector visualization background?\n\nGeneral\nWater resource specific\n\nPackage background\n\nBasic plotting with stars/sf\nmore advanced plotting with ggplot/ggmap" + }, + { + "objectID": "wsim-gldas-vis.html#setup", + "href": "wsim-gldas-vis.html#setup", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Setup", + "text": "Setup\nlibrary(stars) # raster manipulation\nlibrary(sf) # vector manipulation\nlibrary(ggplot2) # advanced plotting\nlibrary(lubridate) # date/time manipulation\nlibrary(exactextractr) # zonal statistics" + }, + { + "objectID": "wsim-gldas-vis.html#load-data", + "href": "wsim-gldas-vis.html#load-data", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Load Data", + "text": "Load Data\nLoad in data from previous vignette.\nI think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset.\n\nVerify data structure with print or summary." + }, + { + "objectID": "wsim-gldas-vis.html#exploratory-histogram", + "href": "wsim-gldas-vis.html#exploratory-histogram", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Exploratory Histogram", + "text": "Exploratory Histogram\nCreate histogram of raster values for a single time step.\nBasic plotting method is OK, but check if it can be done with ggplotso we can use a uniform palette across all modules.\n\nExtreme values or other items of note might require additional visualization or other data exploration." + }, + { + "objectID": "wsim-gldas-vis.html#multi-panel-time-series", + "href": "wsim-gldas-vis.html#multi-panel-time-series", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Multi-Panel Time Series", + "text": "Multi-Panel Time Series\nCreate a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.\nLoad in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette.\n\nStart with the basic plotting commands–create the time series with slice or other method used in previous vignette.\n\nThe palette will not exist and be difficult to use.\nTry a built in palette for stars (not sure if possible?).\nIntroduce the official WSIM palette. This may only be helpful within a ggplot function.\n# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset\nleg_colors<-c(\n '#9B0039',\n # -50 to -40\n '#D44135',\n # -40 to -20\n '#FF8D43',\n # -20 to -10\n '#FFC754',\n # -10 to -5\n '#FFEDA3',\n # -5 to -3\n '#FFFFFF',\n # -3 to 3\n '#CEFFAD',\n # 3 to 5\n '#00F3B5',\n # 5 to 10\n '#00CFCE',\n # 10 to 20\n '#009ADE',\n # 20 to 40\n '#0045B5')\nOnce hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset.\nLoad in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use dplyr::slice or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette.\n\nCreate a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use ggplot and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible)." + }, + { + "objectID": "wsim-gldas-vis.html#use-case-background", + "href": "wsim-gldas-vis.html#use-case-background", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Use Case Background", + "text": "Use Case Background\nNow that we’ve keyed in on a particular event, bring in the backup information we’ve collected to discuss what actually happened." + }, + { + "objectID": "wsim-gldas-vis.html#point-location-time-series", + "href": "wsim-gldas-vis.html#point-location-time-series", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Point Location Time Series", + "text": "Point Location Time Series\nVisualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.\nCreate a vector with the point location.\n\nUse stars::extract to extract raster values in the stack at the point location.\n\nThe resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in ggplot. Use either pivot wider/longer from dplyr or cast/melt from data.table.\n\nOnce in the proper format, plot using ggplot." + }, + { + "objectID": "wsim-gldas-vis.html#population-exposure-plot", + "href": "wsim-gldas-vis.html#population-exposure-plot", + "title": "WSIM-GLDAS Dataset Exploration and Visualizations", + "section": "Population Exposure Plot", + "text": "Population Exposure Plot\nUse Gridded Population of the World and exactextractr to determine the number of people exposed to a given anomaly for each month of the year.\nBackground info on GPW would be appropriate. Same with exactextractr and zonal statistics.\nLoad in GPW data and the exactextractr package\n\nPerform the time series zonal summary.\nThis might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website.\nResulting data.frame will probably need to be transformed to long (just like before), so it can be plotted.\n\nNow plot the data in ggplot. I have some existing code I can pull to help with the plotting–or at least make it fancy." + } +] \ No newline at end of file diff --git a/docs/wsim-gldas-acquisition.html b/docs/wsim-gldas-acquisition.html new file mode 100644 index 0000000..62c2e10 --- /dev/null +++ b/docs/wsim-gldas-acquisition.html @@ -0,0 +1,4987 @@ + + + + + + + + + + + +WSIM-GLDAS TOPS SCHOOL Lesson - Acquiring and Pre-Processing the WSIM-GLDAS Dataset + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + +
+ + + +
+ +
+
+

Acquiring and Pre-Processing the WSIM-GLDAS Dataset

+
+ + + +
+ +
+
Author
+
+

Josh Brinks

+
+
+ +
+
Published
+
+

December 6, 2023

+
+
+ + +
+ + +
+ +
+

TO DO

+
    +
  • Add context for package selection +
      +
    • stars for raster management
    • +
    • sf for vector/shapefile/geojson
    • +
    • lubridate for date management
    • +
  • +
  • More dataset context/explanation (geoBoundaries vs gadm).
  • +
  • Citations and external links for data and packages.
  • +
  • Decide on which wsim-gldas version to use. +
      +
    • Switch out the current for a 12-month integration anomaly.
    • +
  • +
  • Write out the smaller pre-processed file to disk for potential use in binder workshop or subsequent lesson in the module.
  • +
  • Introduce automated data acquisition with some sedac or earthdata package??
  • +
+
+
+

Introduction

+

WSIM-GLDAS can be download from SEDAC. Downloads are organized by combination of variable (composite surplus/deficit, temperature, PETmE, runoff, soil moisture, precipitation) and integration period (1, 3, 6, 12 months). Each variable-integration combination consists of a NetCDF raster file with a time dimension that contains a raster layer for each of the 804 months between January, 1948 and December, 2014. Some variables also contain multiple attributes each with their own time dimension of 804 rasters. Hence, this is a large file that takes upwards of 2 hours to download and may cause memory issues on certain systems. We will work with the composite anomolies integrated over 1 month periods.

+
+
+

Reading the Data

+

Once you’ve completed the download and placed the .nc into your working directory read in the file with the stars::read_stars() function.

+
+
# proxy = TRUE will limit memory useage but does 
+# not always work with certain downstream processing functions
+
+wsim_gldas_anoms <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE)
+
+
deficit, deficit_cause, surplus, surplus_cause, both, 
+
+
print(wsim_gldas_anoms)
+
+
stars object with 3 dimensions and 5 attributes
+attribute(s), summary of first 1e+05 cells:
+               Min.     1st Qu.      Median         Mean     3rd Qu.       Max.
+deficit        -100  -1.8314584  -0.2373874  -1.26453645  -0.2373874   1.896493
+deficit_cause     1 129.0000000 129.0000000 112.90956000 129.0000000 129.000000
+surplus        -100  -0.9671488  -0.7329655  -0.95631468  -0.6206152   2.384447
+surplus_cause     1 129.0000000 129.0000000 127.37130000 129.0000000 129.000000
+both              0   0.0000000   0.0000000   0.03784493   0.0000000   2.384447
+                NA's
+deficit        87340
+deficit_cause      0
+surplus        98724
+surplus_cause      0
+both           98724
+dimension(s):
+     from   to offset delta  refsys                    values x/y
+x       1 1440   -180  0.25  WGS 84                      NULL [x]
+y       1  600     90 -0.25  WGS 84                      NULL [y]
+time    1  804     NA    NA POSIXct 1948-01-01,...,2014-12-01    
+
+
+

The print command gives some basic information. The outputs tells us we have 5 attributes (deficit, deficit_cause, surplus, surplus_cause, both) and 3 dimensions. The first 2 dimensions are the spatial extents (x/y–longitude/latitude) and time is the 3rd dimension.

+

This means the total number of individual raster layers in this NetCDF is 4020 (5 attributes x 804 time steps/months).

+
+
+

Attribute Selection

+

We can start paring this down by subsetting for just the combined surplus/deficit anomaly (both).

+
+
names(wsim_gldas_anoms)
+
+
[1] "deficit"       "deficit_cause" "surplus"       "surplus_cause"
+[5] "both"         
+
+
wsim_gldas_anoms <- wsim_gldas_anoms['both']
+
+
+
+

Time Selection

+

Specifying a temporal range of interest will free up more space. We’ll grab every month for 2000-2014. This can be accomplished by generating a sequence for every month between January 2000 and December 2014, and then passing that vector of dates to filter.

+
+
# generate a vector of dates for subsetting
+keeps<-seq(lubridate::ymd("2000-01-01"),
+           lubridate::ymd("2014-12-01"), 
+           by = "month")
+# filter using that vector
+wsim_gldas_anoms <- dplyr::filter(wsim_gldas_anoms, time %in% keeps)
+
+print(wsim_gldas_anoms)
+
+
stars object with 3 dimensions and 1 attribute
+attribute(s), summary of first 1e+05 cells:
+      Min. 1st Qu. Median     Mean  3rd Qu. Max.  NA's
+both     0       0      0 5.415477 1.631542  100 98724
+dimension(s):
+     from   to offset delta  refsys                    values x/y
+x       1 1440   -180  0.25  WGS 84                      NULL [x]
+y       1  600     90 -0.25  WGS 84                      NULL [y]
+time    1  180     NA    NA POSIXct 2000-01-01,...,2014-12-01    
+
+
+

Now we’re down to a single attribute (“both”) with 180 time-steps. We can take a look at the first 6 months by passing the object through slice and then into plot.

+
+
wsim_gldas_anoms |>
+  dplyr::slice(index = 1:6, along = "time") |>
+  plot(key.pos = 1)
+
+
downsample set to 1
+
+
+

+
+
+

Although we’ve pared it down to a single attribute with a restricted time period of interest, we can take it a step further and reduce the spatial extent to a country or state of interest.

+
+
+

Spatial Selection

+

We can spatially crop the raster stack with a few different methods. Options include using a bounding box (xmin, ymin, xmax, ymax), another raster object, or a vectorized boundary like a shapefile or geojson.

+

Using a vector boundary is one of the more common geoprocessing tasks. In this example we’ll pull a geojson of the United States from the geoBoundaries API. You can also download vectorized boundaries directly from .

+

The call to geoBoundaries’ API is pretty simple:

+

“https://www.geoboundaries.org/api/current/gbOpen/ISO3C/LEVEL/”

+

We adjust the bolded components of the URL address to specify the country we want using the ISO 3 Character Country Code (USA) and the desired Administrative Level (ADM1).

+
+
usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/")
+
+

After the GET call, we have to translate the content.

+
+
usa <- httr::content(usa)
+
+names(usa)
+
+
 [1] "boundaryID"                "boundaryName"             
+ [3] "boundaryISO"               "boundaryYearRepresented"  
+ [5] "boundaryType"              "boundaryCanonical"        
+ [7] "boundarySource"            "boundaryLicense"          
+ [9] "licenseDetail"             "licenseSource"            
+[11] "boundarySourceURL"         "sourceDataUpdateDate"     
+[13] "buildDate"                 "Continent"                
+[15] "UNSDG-region"              "UNSDG-subregion"          
+[17] "worldBankIncomeGroup"      "admUnitCount"             
+[19] "meanVertices"              "minVertices"              
+[21] "maxVertices"               "meanPerimeterLengthKM"    
+[23] "minPerimeterLengthKM"      "maxPerimeterLengthKM"     
+[25] "meanAreaSqKM"              "minAreaSqKM"              
+[27] "maxAreaSqKM"               "staticDownloadLink"       
+[29] "gjDownloadURL"             "tjDownloadURL"            
+[31] "imagePreview"              "simplifiedGeometryGeoJSON"
+
+
+

The parsed content object contains 32 components. Item 29 is a direct link to the geojson file (gjDownloadURL). Read that in and check the visuals.

+
+
usa <- sf::st_read(usa$gjDownloadURL)
+
+
Reading layer `geoBoundaries-USA-ADM1' from data source 
+  `https://github.com/wmgeolab/geoBoundaries/raw/c0ed7b8/releaseData/gbOpen/USA/ADM1/geoBoundaries-USA-ADM1.geojson' 
+  using driver `GeoJSON'
+Simple feature collection with 56 features and 5 fields
+Geometry type: MULTIPOLYGON
+Dimension:     XY
+Bounding box:  xmin: -179.1489 ymin: -14.54869 xmax: 179.7785 ymax: 71.36516
+Geodetic CRS:  WGS 84
+
+
plot(sf::st_geometry(usa))
+
+

+
+
+

This looks good, but it includes all United States territories. For simplicity, we can get it down to only the contiguous United States.

+
+
drops<-
+  c("Alaska", "Hawaii", 
+    "American Samoa",
+    "Puerto Rico",
+    "Commonwealth of the Northern Mariana Islands", 
+    "Guam", 
+    "United States Virgin Islands")
+
+usa<-usa[!(usa$shapeName %in% drops),]
+
+plot(sf::st_geometry(usa))
+
+

+
+
+

We can take this a step further and select a target state.

+
+
texas <- usa[usa$shapeName == "Texas",]
+
+plot(sf::st_geometry(texas))
+
+

+
+
+

From here we can crop the WSIM GLDAS raster stack by indexing it with the stored boundary of Texas

+
+
wsim_gldas_anoms_tex <- wsim_gldas_anoms[texas]
+
+
Warning in st_crop.stars(x, i, crop = crop, ...): st_crop: bounding boxes of x
+and y do not overlap
+
+
+

For a final visual check we’ll take the last time-step in the WSIM-GLDAS dataset (180/December, 2014) and plot it with an overlay of the Texas boundary.

+
+
wsim_gldas_anoms_tex |>
+  dplyr::slice(index = 180, along = "time") |>
+  plot(reset = FALSE)
+
+plot(sf::st_geometry(texas),
+     add = TRUE,
+     lwd = 3,
+     fill = NA,
+     border = 'purple')
+
+

+
+
+

The subsetted dataset may be written to disk, and saved for future modules.

+
+
stars::write_stars(wsim_gldas_anoms_tex, "wsim_gldas_tex.nc")
+
+

The size of the pre-processed dataset is 1.6 MB compared to the original dataset of 1.7 GB. This is much more manageable in cloud environments, workshops, and git platforms.

+ + +
+ +
+ +
+ + + + \ No newline at end of file diff --git a/docs/wsim-gldas-vis.html b/docs/wsim-gldas-vis.html new file mode 100644 index 0000000..7f94a68 --- /dev/null +++ b/docs/wsim-gldas-vis.html @@ -0,0 +1,4892 @@ + + + + + + + + + + + +WSIM-GLDAS TOPS SCHOOL Lesson - WSIM-GLDAS Dataset Exploration and Visualizations + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + +
+ + + +
+ +
+
+

WSIM-GLDAS Dataset Exploration and Visualizations

+
+ + + +
+ +
+
Author
+
+

Joshua Brinks

+
+
+ +
+
Published
+
+

December 6, 2023

+
+
+ + +
+ + +
+ +
+

TO DO

+
    +
  • Write the actual code and narrative.
  • +
  • Determine the region and time period of focus to draw in our use cases/human focused stories.
  • +
  • Determine the method of exploration. +
      +
    • Mimic our process? +
        +
      • 12 month integration panels of the CON USA from 2000-2014 to identify areas of interest.
      • +
      • Zoom in to locations of interest and switch to 1-month integration for the years identified in the previous step.
      • +
    • +
  • +
+
+
+

Introduction

+
    +
  • Raster/vector visualization background? +
      +
    • General
    • +
    • Water resource specific
    • +
  • +
  • Package background +
      +
    • Basic plotting with stars/sf
    • +
    • more advanced plotting with ggplot/ggmap
    • +
  • +
+
+
+

Setup

+
library(stars) # raster manipulation
+library(sf) # vector manipulation
+library(ggplot2) # advanced plotting
+library(lubridate) # date/time manipulation
+library(exactextractr) # zonal statistics
+
+
+

Load Data

+

Load in data from previous vignette.

+

I think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset.

+
+

Verify data structure with print or summary.

+
+
+

Exploratory Histogram

+

Create histogram of raster values for a single time step.

+

Basic plotting method is OK, but check if it can be done with ggplotso we can use a uniform palette across all modules.

+
+

Extreme values or other items of note might require additional visualization or other data exploration.

+
+
+

Multi-Panel Time Series

+

Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year.

+

Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette.

+
+

Start with the basic plotting commands–create the time series with slice or other method used in previous vignette.

+
+

The palette will not exist and be difficult to use.

+

Try a built in palette for stars (not sure if possible?).

+

Introduce the official WSIM palette. This may only be helpful within a ggplot function.

+
# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset
+leg_colors<-c(
+    '#9B0039',
+    # -50 to -40
+    '#D44135',
+    # -40 to -20
+    '#FF8D43',
+    # -20 to -10
+    '#FFC754',
+    # -10 to -5
+    '#FFEDA3',
+    # -5 to -3
+    '#FFFFFF',
+    # -3 to 3
+    '#CEFFAD',
+    # 3 to 5
+    '#00F3B5',
+    # 5 to 10
+    '#00CFCE',
+    # 10 to 20
+    '#009ADE',
+    # 20 to 40
+    '#0045B5')
+

Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset.

+

Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use dplyr::slice or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette.

+
+

Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use ggplot and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible).

+
+
+
+

Use Case Background

+

Now that we’ve keyed in on a particular event, bring in the backup information we’ve collected to discuss what actually happened.

+
+
+

Point Location Time Series

+

Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event.

+

Create a vector with the point location.

+
+

Use stars::extract to extract raster values in the stack at the point location.

+
+

The resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in ggplot. Use either pivot wider/longer from dplyr or cast/melt from data.table.

+
+

Once in the proper format, plot using ggplot.

+
+
+
+

Population Exposure Plot

+

Use Gridded Population of the World and exactextractr to determine the number of people exposed to a given anomaly for each month of the year.

+

Background info on GPW would be appropriate. Same with exactextractr and zonal statistics.

+

Load in GPW data and the exactextractr package

+
+

Perform the time series zonal summary.

+

This might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website.

+

Resulting data.frame will probably need to be transformed to long (just like before), so it can be plotted.

+
+

Now plot the data in ggplot. I have some existing code I can pull to help with the plotting–or at least make it fancy.

+ + +
+ +
+ +
+ + + + \ No newline at end of file diff --git a/wsim-gldas-acquisition.qmd b/wsim-gldas-acquisition.qmd new file mode 100644 index 0000000..dfbdabe --- /dev/null +++ b/wsim-gldas-acquisition.qmd @@ -0,0 +1,161 @@ +--- +title: "Acquiring and Pre-Processing the WSIM-GLDAS Dataset" +author: "Josh Brinks" +date: "December, 6 2023" +--- + +## TO DO + + * Add context for package selection + + stars for raster management + + sf for vector/shapefile/geojson + + lubridate for date management + * More dataset context/explanation (geoBoundaries vs gadm). + * Citations and external links for data and packages. + * Decide on which wsim-gldas version to use. + + Switch out the current for a 12-month integration anomaly. + * Write out the smaller pre-processed file to disk for potential use in binder workshop or subsequent lesson in the module. + * Introduce automated data acquisition with some sedac or earthdata package?? + +## Introduction + +WSIM-GLDAS can be download from [SEDAC](https://sedac.ciesin.columbia.edu/data/set/water-wsim-gldas-v1). Downloads are organized by combination of variable (composite surplus/deficit, temperature, PETmE, runoff, soil moisture, precipitation) and integration period (1, 3, 6, 12 months). Each variable-integration combination consists of a NetCDF raster file with a time dimension that contains a raster layer for each of the 804 months between January, 1948 and December, 2014. Some variables also contain multiple attributes each with their own time dimension of 804 rasters. Hence, this is a large file that takes upwards of 2 hours to download and may cause memory issues on certain systems. We will work with the composite anomolies integrated over 1 month periods. + +## Reading the Data + +Once you've completed the download and placed the .nc into your working directory read in the file with the `stars::read_stars()` function. + +```{r} + +# proxy = TRUE will limit memory useage but does +# not always work with certain downstream processing functions + +wsim_gldas_anoms <- stars::read_stars("composite_anom_1mo.nc", proxy = FALSE) + +print(wsim_gldas_anoms) +``` + +The print command gives some basic information. The outputs tells us we have 5 attributes (deficit, deficit_cause, surplus, surplus_cause, both) and 3 dimensions. The first 2 dimensions are the spatial extents (x/y--longitude/latitude) and time is the 3rd dimension. + +This means the total number of individual raster layers in this NetCDF is 4020 (5 attributes x 804 time steps/months). + +## Attribute Selection + +We can start paring this down by subsetting for just the combined surplus/deficit anomaly (both). + +```{r} +names(wsim_gldas_anoms) + +wsim_gldas_anoms <- wsim_gldas_anoms['both'] +``` +## Time Selection + +Specifying a temporal range of interest will free up more space. We'll grab every month for 2000-2014. This can be accomplished by generating a sequence for every month between January 2000 and December 2014, and then passing that vector of dates to `filter`. + + +```{r} +# generate a vector of dates for subsetting +keeps<-seq(lubridate::ymd("2000-01-01"), + lubridate::ymd("2014-12-01"), + by = "month") +# filter using that vector +wsim_gldas_anoms <- dplyr::filter(wsim_gldas_anoms, time %in% keeps) + +print(wsim_gldas_anoms) +``` + +Now we're down to a single attribute ("both") with 180 time-steps. We can take a look at the first 6 months by passing the object through `slice` and then into `plot`. + +```{r} +wsim_gldas_anoms |> + dplyr::slice(index = 1:6, along = "time") |> + plot(key.pos = 1) +``` + +Although we've pared it down to a single attribute with a restricted time period of interest, we can take it a step further and reduce the spatial extent to a country or state of interest. + +## Spatial Selection + +We can spatially crop the raster stack with a few different methods. Options include using a bounding box (xmin, ymin, xmax, ymax), another raster object, or a vectorized boundary like a shapefile or geojson. + +Using a vector boundary is one of the more common geoprocessing tasks. In this example we'll pull a geojson of the United States from the geoBoundaries API. You can also download vectorized boundaries directly from [](www.geoboundaries.org). + +The call to geoBoundaries' API is pretty simple: + +"https://www.geoboundaries.org/api/current/gbOpen/*ISO3C*/*LEVEL*/" + +We adjust the bolded components of the URL address to specify the country we want using the ISO 3 Character Country Code (USA) and the desired Administrative Level (ADM1). + +```{r} +usa <- httr::GET("https://www.geoboundaries.org/api/current/gbOpen/USA/ADM1/") +``` + +After the `GET` call, we have to translate the `content`. + +```{r} +usa <- httr::content(usa) + +names(usa) +``` + +The parsed content object contains 32 components. Item 29 is a direct link to the geojson file (gjDownloadURL). Read that in and check the visuals. + +```{r} +usa <- sf::st_read(usa$gjDownloadURL) + +plot(sf::st_geometry(usa)) +``` +This looks good, but it includes all United States territories. For simplicity, we can get it down to only the contiguous United States. + +```{r} +drops<- + c("Alaska", "Hawaii", + "American Samoa", + "Puerto Rico", + "Commonwealth of the Northern Mariana Islands", + "Guam", + "United States Virgin Islands") + +usa<-usa[!(usa$shapeName %in% drops),] + +plot(sf::st_geometry(usa)) +``` + +We can take this a step further and select a target state. + + +```{r} +texas <- usa[usa$shapeName == "Texas",] + +plot(sf::st_geometry(texas)) +``` + +From here we can crop the WSIM GLDAS raster stack by indexing it with the stored boundary of Texas + + +```{r} +wsim_gldas_anoms_tex <- wsim_gldas_anoms[texas] +``` + +For a final visual check we'll take the last time-step in the WSIM-GLDAS dataset (180/December, 2014) and plot it with an overlay of the Texas boundary. + + +```{r} +wsim_gldas_anoms_tex |> + dplyr::slice(index = 180, along = "time") |> + plot(reset = FALSE) + +plot(sf::st_geometry(texas), + add = TRUE, + lwd = 3, + fill = NA, + border = 'purple') +``` + +The subsetted dataset may be written to disk, and saved for future modules. + +```{r} +stars::write_stars(wsim_gldas_anoms_tex, "wsim_gldas_tex.nc") +``` + +The size of the pre-processed dataset is 1.6 MB compared to the original dataset of 1.7 GB. This is much more manageable in cloud environments, workshops, and git platforms. \ No newline at end of file diff --git a/wsim-gldas-vis.qmd b/wsim-gldas-vis.qmd new file mode 100644 index 0000000..83a5c0d --- /dev/null +++ b/wsim-gldas-vis.qmd @@ -0,0 +1,177 @@ +--- +title: "WSIM-GLDAS Dataset Exploration and Visualizations" +author: "Joshua Brinks" +date: "December, 6 2023" +--- + +## TO DO + + * Write the actual code and narrative. + * Determine the region and time period of focus to draw in our use cases/human focused stories. + * Determine the method of exploration. + + Mimic our process? + * 12 month integration panels of the CON USA from 2000-2014 to identify areas of interest. + * Zoom in to locations of interest and switch to 1-month integration for the years identified in the previous step. + +## Introduction + + * Raster/vector visualization background? + + General + + Water resource specific + * Package background + + Basic plotting with stars/sf + + more advanced plotting with ggplot/ggmap + +## Setup + +```r +library(stars) # raster manipulation +library(sf) # vector manipulation +library(ggplot2) # advanced plotting +library(lubridate) # date/time manipulation +library(exactextractr) # zonal statistics + +``` + +## Load Data + +Load in data from previous vignette. + +I think the previous vignette should end with a 2000-2014 12-month integration CONUS dataset. + +```r + +``` + +Verify data structure with `print` or `summary.` + +## Exploratory Histogram + +Create histogram of raster values for a single time step. + +Basic plotting method is OK, but check if it can be done with `ggplot`so we can use a uniform palette across all modules. + +```r + +``` + +Extreme values or other items of note might require additional visualization or other data exploration. + +## Multi-Panel Time Series + +Create a multipanel time series of 12 month integration CONUSA; similar to what we used to identify our case studies. Each panel will represent 1 year. + +Load in a CONUSA geojson from geoBoundaries. Copy methods from previous vignette. + +```r + +``` + + +Start with the basic plotting commands--create the time series with `slice` or other method used in previous vignette. + +```r + +``` + +The palette will not exist and be difficult to use. + +Try a built in palette for stars (not sure if possible?). + +Introduce the official WSIM palette. This may only be helpful within a `ggplot` function. + +```r +# numbers are return period values for a composite surplus (blues) and deficit (reds) dataset +leg_colors<-c( + '#9B0039', + # -50 to -40 + '#D44135', + # -40 to -20 + '#FF8D43', + # -20 to -10 + '#FFC754', + # -10 to -5 + '#FFEDA3', + # -5 to -3 + '#FFFFFF', + # -3 to 3 + '#CEFFAD', + # 3 to 5 + '#00F3B5', + # 5 to 10 + '#00CFCE', + # 10 to 20 + '#009ADE', + # 20 to 40 + '#0045B5') +``` + +Once hot spots are easily identified pick a region of interest to zoom in on using the 1 month integration dataset. + +Load in the 1 month integration dataset and subset/index the dataset to the region of interest (copy code from previous vignette). Use `dplyr::slice` or other method to pull out just the 12 months from the year of interest. Code demonstrating these techniques in previous vignette. + +```r + +``` + +Create a multi-panel figure with each panel representing 1 month to identify the most intense months of drought or flooding. Starting with this one maybe use `ggplot` and a nice palette, legend, and panel headings. Will probably have to use some sort of faceting to make individual panels (might not be possible). + +```r + +``` + +## Use Case Background + +Now that we've keyed in on a particular event, bring in the backup information we've collected to discuss what actually happened. + +## Point Location Time Series + +Visualize an individual cell with particular extreme or maybe volatile values. Use Google Maps to identify the latitude/longitude of a place of interest. Maybe an urban center or other important location in the region that suffered from the extreme event. + +Create a vector with the point location. + +```r + +``` + +Use `stars::extract` to extract raster values in the stack at the point location. + +```r + +``` + +The resulting data frame of time series values should be inspected. It may also need to be converted from wide format to long format so it may be plotted in `ggplot`. Use either pivot wider/longer from `dplyr` or cast/melt from `data.table`. + +```r + +``` + +Once in the proper format, plot using `ggplot`. + +```r + +``` + +## Population Exposure Plot + +Use Gridded Population of the World and `exactextractr` to determine the number of people exposed to a given anomaly for each month of the year. + +Background info on GPW would be appropriate. Same with `exactextractr` and zonal statistics. + +Load in GPW data and the `exactextractr` package + +```r + +``` + +Perform the time series zonal summary. + +This might be a bit tricky; been a while for me. Have to look up the proper code. Dan has good examples on the exactextractr package website. + +Resulting data.frame will probably need to be transformed to long (just like before), so it can be plotted. + +```r + +``` + +Now plot the data in ggplot. I have some existing code I can pull to help with the plotting--or at least make it fancy.