bv57_exploratory_density.qmd

---
title: "Analysis of UVP data along a transect"
---

# How does abundance change along the transect?

### 1 - We need to load in and parse through the data

We need to first import the data and select the core information that we need. The UVP project folder has all the casts we want and then some. We only want the BATS Validation cruise - bv57. We'll need to remove the hs casts.

#### Setting up the requisites

```{r echo = F, warning=F}
rm(list = ls())
library(EcotaxaTools)
library(tibble)
library(ggplot2)

```

#### Importing data and trimming list to keep only bv casts

```{r}
uvp_dat <- ecopart_import("~/BV_transect_export") #load in initial ecopart object

# we need to trim out the hs casts
# One way to do that would be to hard-code indices for those but that's not reliable
#So I'm using regex

#get index of hs data for both par and zoo files
hs_index_par <- grep("hs",names(uvp_dat$par_files))
hs_index_zoo <- grep('hs', names(uvp_dat$zoo_files))

uvp_dat$par_files <- uvp_dat$par_files[-hs_index_par]
uvp_dat$zoo_files <- uvp_dat$zoo_files[-hs_index_zoo]
```

### 2 - Identifying the meta-data

We want to now get a little idea of the data that we have. The main questions we have are how many locations are at each site, which are day vs night, and how deep does each cast go?

#### First, let's look at the time of day for these casts

All these data are in the meta file. Something to keep in mind is that UVP time is recorded in UTC - so you need to get sunrise/sunset times from [NOAA's Solar Calculator](https://gml.noaa.gov/grad/solcalc/sunrise.html). I'm not super precise with it, I just go with the month at longitude.

```{r}
# I'm writing a function for myself for later
get_time <- function(full_date) {
  time_stamp <- sapply(strsplit(as.character(full_date)," "),
                       `[[`,2)
  return(as.POSIXct(time_stamp, format = '%H:%M:%S'))
}

#my function above will just get the time and set it to todays date but that is fine
uvp_dat$meta$time_of_day <- sapply(get_time(uvp_dat$meta$sampledate), timeOfDay,'10:18:00','21:48:00',1) #this will add time of day to our metadata file

```

There's actually very few night casts, let's just go with the day casts then for this project.

#### Now we want to check the max-depth for all our casts

```{r}
# a quick funciton to find max depth
get_max_d <- function(zoo_file) {
  max_d <- zoo_file$depth_including_offset[which.max(zoo_file$depth_including_offset)]
  return(max_d)
}

max_d <- rep(NA, length(uvp_dat$zoo_files))
for(i in 1:length(uvp_dat$zoo_files)) {
  max_d[i] <- get_max_d(uvp_dat$zoo_files[[i]])
}
depth_tibble <- tibble(profileid = names(uvp_dat$zoo_files),
                       max_d = max_d)

uvp_dat$meta <- as_tibble(merge(uvp_dat$meta,
                                depth_tibble))
tibble(Profile_id = uvp_dat$meta$profileid,
       lat = uvp_dat$meta$latitude,
       tod = uvp_dat$meta$time_of_day,
       max_d = uvp_dat$meta$max_d)
```

Looking at this data, we have several casts going to at least 500m, some going to 1200m, and several going to 5000m. So what we can do is look at four ocean regions - epipelagic (0-200), upper mesopelagic (200-500), lower mesopelagic (500 - 1200), bathypelagic (1200 - 4000).

Next steps will be to calculate concentration for each profile, then look at integrated densities. But first we need to rename our data to core taxonomic groups.

#### First let's look at which taxonomic categories we have:

```{r}
list_taxa <- function(uvp_dat) {
  tdf <- do.call(rbind, uvp_dat$zoo_files)
  return(tdf$name)
} #should add name column option

cbind(names(table(list_taxa(uvp_dat))),
       table(list_taxa(uvp_dat)) / length(list_taxa(uvp_dat)))
```

From this, let's look at: Rhizaria, Copepods, Trichodesmium, Chaetognath, Annelida, Eumalacostraca. One issue we want to correct is that darksphere is classified as living and we want to change that to be non-living

```{r}
names <- c('living', "Rhizaria","Copepoda","Trichodesmium",
           "Chaetognatha",'Annelida',"Eumalacostraca",'not-living','darksphere')

uvp_dat <- uvp_dat %>% 
  add_zoo(names_to, 'name', names, suppress_print = T)

darksphere_switch <- function(zoo_file) {
  new_name <- zoo_file$name
  new_name[which(new_name == 'darksphere')] <- 'not-living'
  return(new_name)
}

uvp_dat <- add_zoo(uvp_dat, darksphere_switch, 'name')
tibble(names(table(list_taxa(uvp_dat))),
       table(list_taxa(uvp_dat)) / length(list_taxa(uvp_dat)))
```

### 3 - Calculating the concentration & Integrating

We now want to calculate the concentration of different taxa. We'll do this in 20m bins.

```{r}
conc_list <- vector('list',length(uvp_dat$zoo_files))
names(conc_list) <- names(uvp_dat$zoo_files)
for(i in 1:length(conc_list)) {
  conc_list[[i]] <- uvp_conc(uvp_dat,names(conc_list)[i],seq(0,max(uvp_dat$zoo_files[[i]]$depth_including_offset),20))
  
  conc_list[[i]]$mp <- get_bin_limtis(conc_list[[i]]$db)$mp
}
```

With this, we can now want the integrated abundance for our four ocean zones. We'll set up a holder shell first. Then fill it out with needed information about each cast which goes into it. We'll reference the meta data folder a lot for this:

```{r}
intg_abund <- vector('list', 4)
names(intg_abund) <- c('euphotic','upper_meso',
                       'lower_meso','bathy')

#a function I might use later
#' @param max_d a vector of max_depths corresponding to casts
#' @param depth_breaks numeric vector with zone limits
#' @param zone_labels character vector of names for lower limits
assign_zones <- function(max_d, depth_breaks, zone_labels) {
  if(length(depth_breaks) != length(zone_labels)) {
    stop('depth_breaks and zone_labels must be equal length')
  }
  
  d_match <- sapply(max_d, nearest, depth_breaks)
  zones <- zone_labels[match(d_match, depth_breaks)]
  return(zones)
}

uvp_dat$meta$zone <- sapply(uvp_dat$meta$max_d, assign_zones,
                            c(500,1200,4000),
                            c(2,3,4))

for(i in 1:length(intg_abund)) {
  intg_abund[[i]]$d_lim <- c(0,200,500,1200,4000)[c(i, i+1)]
  intg_abund[[i]] <- c(intg_abund[[i]], vector('list', 5))
  names(intg_abund[[i]]) <- c('d_lim','Trichodesmium','Rhizaria',
                            'Eumalacostraca','Copepoda', 'Chaetognatha')
  for(j in 2:length(intg_abund[[i]])) {
    intg_abund[[i]][[j]]$profile_id <- uvp_dat$meta$profileid[which(uvp_dat$meta$zone >= i)]
  }
}
```

Now we have a list shell set up to integrate abundance for each of our taxa of interest in each zone. This will be a fairly large loop since we're iterating the process many times over. This is definitely the quick and dirty way to get things done. In the future I might want to build out functions for each of these tasks then apply them to each individual taxa

```{r}

#ugly functions to avoid ridiculous looping
subset_taxa <- function(df, taxa) {
  rdf <- df[df$taxa == taxa,]
  return(rdf)
}

subset_d <- function(df, d_lower, d_upper) {
  rdf <- df[df$mp > d_lower &
              df$mp < d_upper, ]
  return(rdf)
}

org_loop <- function(df, lim_list) {
  intg <- trapz_integarate(df$mp, df$conc_m3,
                           lim_list$d_lim[1], lim_list$d_lim[2],
                           subdivisions = 500)
  return(intg$value)
}

nrow_check <- function(df) {
  if(nrow(df) == 0) {
    return(FALSE)
  } else {
    return(TRUE)
  }
}
  
for(i in 1:length(intg_abund)) {
  for(j in 2:length(intg_abund[[i]])) {
    temp_conc_list <- lapply(conc_list, subset_taxa, names(intg_abund[[i]])[j])
    temp_conc_list <- lapply(temp_conc_list, subset_d, intg_abund[[i]]$d_lim[1], intg_abund[[i]]$d_lim[2])
    temp_conc_list <- temp_conc_list[which(names(temp_conc_list) %in% intg_abund[[i]][[j]]$profile_id)]
    
    if(any(!(sapply(temp_conc_list,nrow_check)))) {
      drop <- which(!(sapply(temp_conc_list,nrow_check)))
      temp_conc_list <- temp_conc_list[-drop]
      intg_abund[[i]][[j]]$profile_id <- intg_abund[[i]][[j]]$profile_id[-drop]
    }
    
    intg_taxa <- sapply(temp_conc_list, org_loop, intg_abund[[i]])
    intg_abund[[i]][[j]]$num_m2 <- as.numeric(intg_taxa[match(names(intg_taxa),intg_abund[[i]][[j]]$profile_id)])
  }
}
```

Great! Now we have a big set of data with integrated abundances in each four depth zone. Now all we have to do is plot them according to their latitudes. Looking at the data however, it seems that we have really limited data for Eumalacostraca and Chaetognatha. So let's just look at Copepods, Rhizaria, and Trichodesmium.

### 4 - Plotting our data

Before we plot the data, it is nice to get everything into a clean plotting dataframe. I'm thinking that it'll be best to plot all three taxa on panels based on the depth

```{r}
#set the taxa we are intested in plotting
plot_taxa <- c('Trichodesmium', 'Rhizaria', 'Copepoda')

mush_into_df <- function(obj) {
  rdf <- as.data.frame(obj[[1]])
  rdf$taxa <- rep(names(obj), nrow(rdf))
  return(rdf)
}

plot_list <- vector('list', 4)
names(plot_list) <- c('Euphotic', 'Upper_Meso','Lower_Meso','Bathy')

for(i in 1:length(plot_list)) {
  for(j in 1:length(intg_abund[[i]][plot_taxa])) {
    plot_list[[i]][[j]] <- mush_into_df(intg_abund[[i]][plot_taxa[j]])
  }
  plot_list[[i]] <- do.call(rbind, plot_list[[i]])
  
  plot_list[[i]] <- merge(plot_list[[i]],
                          data.frame(profile_id = uvp_dat$meta$profileid,
                                     lat = uvp_dat$meta$latitude))
  
}
```

Great! Now everything is all set up for us to plot our figures based on latitude

```{r}
plot_output <- vector('list', 4)
for(i in 1:length(plot_output)) {
  plot_list[[i]] <- plot_list[[i]][!is.na(plot_list[[i]]$num_m2),]
  plot_output[[i]] <- ggplot(plot_list[[i]],aes(y = num_m2, x = lat, col = taxa)) +
   geom_point(size = 2) +
    geom_smooth(se = F, method = 'loess',
                size = 1, alpha = .05, span = 1)+
   coord_flip()+
    labs(x = 'Latitude', y = 'Integrated Abudance [Number/m2]',
         subtitle = names(plot_list)[i],
         col = "")+
   theme_bw()
}
print(plot_output)
```

Cool we have some figures to start working with. I'm saving a bit of data to and providing here. It is an RDS so to load it you just type readRDS('filename.rds').

```{r}
saveRDS(plot_list, './item01_integrated-abundances-by-zone')
```