-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathPrac4_solution.Rmd
91 lines (71 loc) · 4.49 KB
/
Prac4_solution.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
title: Exercise 4<br> Variance estimation for systematic survey designs
author: Centre for Research into Ecological and Environmental Modelling <br> **University of St Andrews**
date: Introduction to distance sampling<br> August/September 2022
output:
rmdformats::readthedown:
highlight: tango
bibliography: references.bib
csl: apa.csl
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
::: {.alert .alert-success}
<strong>Solutions</strong> Variance estimation for systematic designs
:::
# Basic (default) variance estimation
Recall the data for this example, in which we have a strong gradient in animal density across our study region and at the same time we have a difference in the lengths of the transects, such that short transects are in regions of high animal density and long transects are in regions of low animal density.
```{r, echo=T, eval=T}
library(Distance)
data("Systematic_variance_2")
conversion.factor <- convert_units("metre", "kilometre", "square kilometre")
# Fit a simple model
sysvar2.hn <- ds(data=Systematic_variance_2, key="hn", adjustment="cos",
convert_units=conversion.factor)
sysvar2.hn$dht$individuals$D
sysvar2.hn$dht$individuals$N
```
The point estimates are good ($\hat D = 2,044$ animals per unit area and $\hat N=1,022$ - note the size of the area) but the precision obtained with the default estimator is poor: estimated abundance ranges from about 580 to 1,800 - a three-fold difference over which we are uncertain. Given that our survey covered 40% of the triangular region and had a good sample size (254 animals on 20 transects), this would be a disappointing result in practice.
# Variance estimation with bootstrapping
The function `bootdht_Nhat_summarize` pulls out the estimates of abundance $(\hat{N_i})$ for all bootstrap replicates $i = 1, \cdots, N_{boot}$ where $N_{boot}$ is the number of replicates requested.
The following command performs the bootstrap.
```{r, quiet=T, suppress_messages=T, message=F, results=F}
est.boot <- bootdht(model=sysvar2.hn, flatfile=Systematic_variance_2,
summary_fun=bootdht_Nhat_summarize,
convert_units=conversion.factor, nboot=100)
```
```{r, eval=T}
summary(est.boot)
```
The bootstrap results are very similar to the analytical results, as we would expect, because again this process assumed the transects were placed at random.
# Post-stratification to improve variance estimation
```{r, echo=T, eval=T}
## Post-stratification by O2 estimator
# ensure that Sample.Labels are numeric, for O2 ordering
Systematic_variance_2$Sample.Label <- as.numeric(Systematic_variance_2$Sample.Label)
# Using the Fewster et al 2009, "O2" estimator
est.O2 <- dht2(sysvar2.hn, flatfile=Systematic_variance_2, strat_formula=~1, convert_units=conversion.factor, er_est="O2")
print(est.O2, report="density")
```
The precision of the estimated abundance has greatly improved in the post-stratified analysis [@fewster2009] .
It must be remembered that we have not made any change to our data by the post-stratification; we are using getting a better estimate of the variance. In this case, the increase in precision could make a fundamental difference to the utility of the survey: it might make the difference between being able to make a management decision or not. Usually, trends will not be as extreme as they are in this example and post-stratification will not make a great difference. Such an situation is illustrated in the next problem.
# Systematic designs where post-stratification is not needed (optional)
These data did not exhibit strong trends across the survey region and, hence, there are no great differences between the CVs and 95% confidence intervals using the two methods.
```{r, echo=T, eval=T}
# Access the data
data("Systematic_variance_1")
# Ensure that Sample.Labels are numeric, for O2 ordering
Systematic_variance_1$Sample.Label <- as.numeric(Systematic_variance_1$Sample.Label)
# First fit a simple model
sysvar1.hn <- ds(Systematic_variance_1, key="hn", adjustment=NULL,
convert_units=conversion.factor)
# Obtain default estimates for comparison
sysvar1.hn$dht$individuals$D
sysvar1.hn$dht$individuals$N
# Now use Fewster et al 2009, "O2" estimator
est2.O2 <- dht2(sysvar1.hn, flatfile=Systematic_variance_1, strat_formula=~1,
convert_units=conversion.factor, er_est="O2")
print(est2.O2, report="density")
```
# References