Prac4_solution.Rmd

---
title: Exercise 4<br> Variance estimation for systematic survey designs
author: Centre for Research into Ecological and Environmental Modelling <br> **University of St Andrews**
date: Introduction to distance sampling<br> August/September 2022
output: 
  rmdformats::readthedown:
    highlight: tango
bibliography: references.bib
csl: apa.csl
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```

::: {.alert .alert-success}
<strong>Solutions</strong> Variance estimation for systematic designs
:::

# Basic (default) variance estimation

Recall the data for this example, in which we have a strong gradient in animal density across our study region and at the same time we have a difference in the lengths of the transects, such that short transects are in regions of high animal density and long transects are in regions of low animal density.

```{r, echo=T, eval=T}
library(Distance)
data("Systematic_variance_2")
conversion.factor <- convert_units("metre", "kilometre", "square kilometre")
# Fit a simple model
sysvar2.hn <- ds(data=Systematic_variance_2, key="hn", adjustment="cos",
                 convert_units=conversion.factor)
sysvar2.hn$dht$individuals$D
sysvar2.hn$dht$individuals$N
```

The point estimates are good ($\hat D = 2,044$ animals per unit area and $\hat N=1,022$ - note the size of the area) but the precision obtained with the default estimator is poor: estimated abundance ranges from about 580 to 1,800 - a three-fold difference over which we are uncertain. Given that our survey covered 40% of the triangular region and had a good sample size (254 animals on 20 transects), this would be a disappointing result in practice.

# Variance estimation with bootstrapping

The function `bootdht_Nhat_summarize` pulls out the estimates of abundance $(\hat{N_i})$ for all bootstrap replicates $i = 1, \cdots, N_{boot}$ where $N_{boot}$ is the number of replicates requested.

The following command performs the bootstrap.

```{r, quiet=T, suppress_messages=T, message=F, results=F}
est.boot <- bootdht(model=sysvar2.hn, flatfile=Systematic_variance_2,
                    summary_fun=bootdht_Nhat_summarize, 
                    convert_units=conversion.factor, nboot=100)
```

```{r, eval=T}
summary(est.boot)
```

The bootstrap results are very similar to the analytical results, as we would expect, because again this process assumed the transects were placed at random.

# Post-stratification to improve variance estimation

```{r, echo=T, eval=T}
## Post-stratification by O2 estimator
# ensure that Sample.Labels are numeric, for O2 ordering
Systematic_variance_2$Sample.Label <- as.numeric(Systematic_variance_2$Sample.Label)
# Using the Fewster et al 2009, "O2" estimator 
est.O2 <- dht2(sysvar2.hn, flatfile=Systematic_variance_2,   strat_formula=~1, convert_units=conversion.factor, er_est="O2")
print(est.O2, report="density")
```

The precision of the estimated abundance has greatly improved in the post-stratified analysis [@fewster2009] .

It must be remembered that we have not made any change to our data by the post-stratification; we are using getting a better estimate of the variance. In this case, the increase in precision could make a fundamental difference to the utility of the survey: it might make the difference between being able to make a management decision or not. Usually, trends will not be as extreme as they are in this example and post-stratification will not make a great difference. Such an situation is illustrated in the next problem.

# Systematic designs where post-stratification is not needed (optional)

These data did not exhibit strong trends across the survey region and, hence, there are no great differences between the CVs and 95% confidence intervals using the two methods.

```{r, echo=T, eval=T}
# Access the data
data("Systematic_variance_1")
# Ensure that Sample.Labels are numeric, for O2 ordering
Systematic_variance_1$Sample.Label <- as.numeric(Systematic_variance_1$Sample.Label)
# First fit a simple model
sysvar1.hn <- ds(Systematic_variance_1, key="hn", adjustment=NULL, 
                 convert_units=conversion.factor)
# Obtain default estimates for comparison
sysvar1.hn$dht$individuals$D
sysvar1.hn$dht$individuals$N
# Now use Fewster et al 2009, "O2" estimator 
est2.O2 <- dht2(sysvar1.hn, flatfile=Systematic_variance_1, strat_formula=~1,
                convert_units=conversion.factor, er_est="O2")
print(est2.O2, report="density")
```

# References