-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathSCR_MCMC.tex
407 lines (299 loc) · 14.4 KB
/
SCR_MCMC.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
\documentclass[12pt]{article}
\usepackage{amsmath}
\usepackage{amssymb} % used for symbols in figure legends
%\usepackage{natbib} % used in formatting of citations and bibliography
\usepackage{url}
\usepackage{framed}
\usepackage{float}
\input{psfig}
%\input{boldGreek}
%\linenumbers
%\raggedright
%\setlength{\parindent}{.25in}
%\lhead{}
%
%\rhead{\thepage}
%\renewcommand\headrulewidth{0.4pt}
\input{boldGreek}
\floatstyle{plain}
\floatname{panel}{Panel}
\newfloat{panel}{h}{txt}
\renewcommand{\baselinestretch}{1.0}
\setlength{\textwidth}{6.5in}
%\setlength{\evensidemargin}{0.1875in}
%\setlength{\oddsidemargin}{0.1875in}
\setlength{\evensidemargin}{0in}
\setlength{\oddsidemargin}{0in}
\setlength{\textheight}{8.425in}
%\setlength{\headheight}{.5in}
%\setlength{\headsep}{.5in}
%\setlength{\parindent}{.25in}
\setlength{\topmargin}{0.375in}
% display page layout
%\usepackage{layouts}
%\newcommand\showpage{%
% \setlayoutscale{0.27}\setlabelfont{\tiny}%
% \printheadingsfalse\printparametersfalse
% \currentpage\pagedesign}
% paragraph formatting
\usepackage{indentfirst} % indent first line of paragraph in new sections
%\usepackage{setspace}
%\singlespacing
%
\usepackage{graphicx}
\floatname{panel}{Panel}
\newfloat{panel}{h}{txt}
%\usepackage[nomarkers,tablesfirst,nolists]{endfloat} % places all tables and figures at end of document
\begin{document}
\begin{center}
\Large \textbf{
SCRbayes: MCMC for a Class of Spatial Capture-Recapture Models
}
\end{center}
%
\noindent
\noindent
\section{Overview}
We are considering a capture-recapture model in which the observations
$y(i,j,k)$ are binary encounters of individual $i$, in trap $j$ and
for period (e.g., night) $k$.
The basic model is:
\[
y(i,j,k) \sim \mbox{Bern}(\pi(i,j,k))
\]
with
\[
\mbox{cloglog}(\pi(i,j,k)) = \alpha_{0} + \alpha_{1}x_{ijk} - (1/\sigma)*d_{ij}^{2}
\]
There could be any number of fixed covariates in this model but here
we have $x_{ijk}$ which I assume to be an indicator of previous capture.
i.e., a behavioral response.
In this case, the behavioral response operates at the trap level.
To attack this problem by MCMC we have to deal with 3 basic elements:
\begin{itemize}
\item[(1)] We don't know $N$. For that we use data augmentation.
\item[(2)] We have to update the ``structural parameters'' of the model.
These are the regression coefficients which include $\alpha_{0}$,
$\alpha_{1}$ and $\sigma$. I will consider this part of the algorithm
first since I think it can be economized considerably with careful thought
and programming.
\item[(3)] We have to update the spatial effects ${\bf s}$ which are
the individual home range centers. These are contained in the $d_{ij}$
terms above.
\end{itemize}
\section{Dealing with unknown $N$}
This is dealt with by data augmentation (Royle et al. 2007). The idea
is to introduce a large number of all-zero encounter histories and
recognize that the augmented data set (containing the all-zeros) is a
zero-inflated version of the known-N model. To implement that in
WinBUGS we introduce a collection of binary indicator variables
$z_{i}$ for each of the $i=1,2,\ldots, M$ individuals ($M$ = size of
augmented data set), and the latent variables $z_{i}$ are updated as
part of the MCMC algorithm. Thus, when $z_{i}=1$, then the all-zero
observation ${\bf y}_{i}$ is an element of the ``real'' population.
That is, ${\bf y}_{i} = {\bf 0}$ is a sampling zero as opposed to a
fixed zero.
I will outline how this is done in an update of the document. The key
thing to keep in mind is that the rest of the MCMC algorithm proceeds
{\bf conditional on the present values of $z_{1},z_{2},\ldots,z_{M}$}.
This really simplifies things.
\section{Updating the Structural Parameters}
To do this, \mbox{\tt SCRbayes}
uses a standard component-wise Metropolis-Hastings
algorithm where we simply compute the log-likelihood for the
observations twice, once given the current parameter value and a 2nd
time given a candidate parameter value. We use the random walk proposal
distribution exclusively. Let ${\bf y}$ be {\bf all} the observations
{\bf for which $z=1$} stacked up on each other in a vector format
($z=1$ means that these are all of the individuals in the actual
population). Let ${\bm \alpha}$ be the vector of regression
parameters. Somehow we obtain a candidate value of the parameter, most
likely by sampling its components from a normal distribution having
mean ${\bm \theta}$ (this is a random walk Metropolis sampler). If we
use that method then define:
\[
r = exp( logL({\bf y}|{\utheta}^{*}) - logL({\bf y}|{\utheta}))
\]
and we accept ${\utheta}^{*}$ with probability $r$. We do this in
practice by flipping a coin with probability $r$....generating a
uniform random number... and then if that number is $< r$ we accept.
In practice we might also do each {\it component} of ${\bf \utheta}$
individually so as to improve mixing of the Markov chains. In the
current version of the R code, \mbox{\tt SCRe.fn}, I am updating
$\sigma$ and $\alpha_{0}$ {\it together} and then doing $\alpha_{1}$ by
itself ({\bf NOTE:} sorry but my notation here is not entirely
consistent with the R code).
The key issue in implement MCMC for this model is {\it efficient
computation of the log-likelihood}. For a situation with 100 traps
and 500 individuals (including pseudo-individuals) and 12 trap
occasions there are 600,000 total observations. This is an enormous
data set to work with. We have to subset the 500 individuals to
include only those for which $z=1$ and then compute the
log-likelihood. We can reduce the size considerably by summing over
the $K=12$ replicates which is allowable if there are no ``time
effects''. However, we would like to compute $logL$ efficiently for
the general case and so we focus on that for now.
Ok, so, what does the $logL$ look like? It is just Bernoulli of the
form:
\[
L = \prod_{i,j,k} \pi(i,j,k)^{y(i,j,k)} (1-\pi(i,j,k))^{1-y(i,j,k)}
\]
where
\[
\pi(i,j,k) = 1-exp(-
exp( \alpha_{0} + \alpha_{1}x_{ik} - (1/\sigma)*d_{ij}^{2} ))
\]
It is worth noting that if $y=0$ then the first part of $L$ is {\it
always} 1.0 whereas if $y=1$ then the 2nd part of $L$ is always 1.0.
Thus we can express the likelihood L as:
\[
L = \prod_{y=1} \pi(i,j,k) \prod_{y=0} (1-\pi(i,j,k))
\]
Further, in terms of the log-likelihood we have
\[
logL = \sum_{y=1} log( \pi(i,j,k) ) \prod_{y=0}
- exp( \alpha_{0} + \alpha_{1}x_{ik} - (1/\sigma)*d_{ij}^{2} )
\]
While my notation is not explicit here I want to emphasize that the
2nd term includes the summed values of $1-\pi(i,j,k)$ only for those
values of $y$ that are equal to 0.
We should be able to come up with some very efficient ways of
evaluating $logL$ either explicitly or by using some tricks, such as
only computing certain required components, or saving bits from
previous evaluations, etc.. We will have to work on this.
Ok, so consider a MH algorithm for the 3 parameters. If we update each
of those individually the MH algorithm is structured like this:
\begin{itemize}
\item (A1) Evaluate $logL({\bf y}|\alpha_{0},\alpha_{1},\sigma)$
\item (A2) Draw a candidate value of $\alpha_{0}$, say $\alpha_{0}^{*}$
\item (A3) Evaluate $logL({\bf y}|\alpha_{0}^{*},\alpha_{1},\sigma)$
\item (A4) Accept with probability $r$ according to the rule given above.
\end{itemize}
\begin{itemize}
\item (B1) Evaluate $logL({\bf y}|\alpha_{0},\alpha_{1},\sigma)$. Obviously
this should not require a complete reevaluation as the calculations
have been done in the A step.
\item (B2) Draw a candidate value of $\alpha_{1}$, say $\alpha_{1}^{*}$
\item (B3) Evaluate $logL({\bf y}|\alpha_{0},\alpha_{1}^{*},\sigma)$
\item (B4) Accept with probability $r$ according to the rule given above.
\end{itemize}
\begin{itemize}
\item (C1) Evaluate $logL({\bf y}|\alpha_{0},\alpha_{1},\sigma)$. Obviously
this should not require a complete reevaluation as the calculations
have been done in the A step.
\item (C2) Draw a candidate value of $\sigma$, say $\sigma^{*}$
\item (C3) Evaluate $logL({\bf y}|\alpha_{0},\alpha_{1},\sigma^{*})$
\item (C4) Accept with probability $r$ according to the rule given above.
\end{itemize}
\subsection{Alternative distance functions}
Easy to do. For now we only allow for estimating the exponent $\theta$
on distance so that it can range from $\theta = .5$ (exponential) to
$\theta = 1$ (Gaussian).
\section{The Data Augmentation Part of the Algorithm}
{\it
This step of the MCMC algorithm determines which of the data
augmentation variables, $z$, are equal to 1. It is the $z=1$ subset of
individuals for which the calculations in the previous Section are
carried out.
}
Bayesian analysis of this class of models is facilitated by data
augmentation. One of the motives for doing this is that Bayesian
analysis is carried out conditional on the individual activity centers
${\bf s}_{i}$. In principle, then, we could specify fairly complex
models describing how the activity centers are distributed in space
and then, in the MCMC, we only have to be able to simulate the
resulting point process appropriately. Conversely, if we tried to do
integrated likelihood then we would have an $N-$fold integral to carry
out.
Operationally, using data augmentation, we add a large number of
all-zero encounter histories to our data set. If $n$ is the number of
individuals encountered then let $M-n$ be the number we add to the
data set, chosen so that $M$ is a reasonable upper bound on the actual
population size $N$. Note, in practice, we probably choose $M$ by
trial and error. The considerations are this: We want $M$ to be as
small as possible in order to save calculations but if it is too
small, then the posterior of $N$ will be truncated. That is bad. So
we have to fiddle around with that until we get it about right.
Given the augmented data set, the model contains a set of binary
indicator variables $z_{i}$ for $i=1,2,\ldots,M$. These are
``zero-inflation'' variables such that if $z_{i}=1$ then individual
$i$ is a member of the population that was exposed to sampling. In
other words, the observations ${\bf y}_{i}$, correspond to data that
arose by the binomial sampling process. Conversely, if $z_{i}=0$, then
the observations ${\bf y}_{i}$ are {\it fixed} zeros. In this way,
i.e., by zero-inflation, data augmentation accounts for the fact that
our augmented data set contains too many zeros.
The model for the $z_{i}$ variables is $z_{i} \sim
\mbox{Bernoulli}(\psi)$. As part of the MCMC algorithm we have to
impute the ``missing'' $z_{i}$ values. Note that $z_{i}=1$ for the
$i=1,2,\ldots,n$ observed individuals, necessarily. Thus we only need
to update $z_{i}$ for the augmented pseudo-individuals. This is a
simple applications of Bayes' rule:
\[
Pr(z=1|{\bf y} = 0) = \frac{Pr({\bf y}=0|z=1)Pr(z=1) }{Pr({\bf y}=0)}
\]
Recall
\[
\pi(i,j,k) = 1-exp(-
exp( \beta_{0} + \beta_{1}x_{ik} - (1/\sigma)*d_{ij}^{2} ))
\]
thus
\begin{eqnarray}
Pr({\bf y}_{i} = 0|z_{i}=1) &=&
\prod_{j} \prod_{k} exp(- exp( \beta_{0} + \beta_{1}x_{ik} - (1/\sigma)*d_{ij}^{2} )) \\
& = & exp(- \sum_{j} \sum_{k} exp( \beta_{0} + \beta_{1}x_{ik} - (1/\sigma)*d_{ij}^{2} ))
\end{eqnarray}
which, clearly, some economies should be realizable. In particular,
we should be able to compute this by recycling calculations from the
previous step although the current version of the MCMC algorithm does
not make an effort to do that. As time permits or we have a post-doc
or something then we can work on that. To get the denominator (the
marginal probability $Pr({\bf y} = 0)$) of this Bayes' rule quantity,
note that $Pr({\bf y}=0|z=0) = 1$.
Once these calculations are done then we simply flip coins for
$i=n+1,\ldots,M$ having probabilities $Pr(z=1|{\bf y} = 0)$ and those
coin flips determine the current values of $z_{n+1},\ldots,z_{M}$.
\section{Updating $\{ {\bf s}_{i} \}$ -- the Activity Centers}
Right now we consider only the case where ${\bf s}$ is uniform on ${\cal
S}$, the collection of points defining the state-space. If $z=0$
then we draw ${\bf s}$ from the prior and it is accepted with
probability 1\footnote{actually we draw it randomly from a
neighborhood and accept it with probability 1 as long as its not
close to the edge, in which case an adjustment is made for asymmetry
of the proposal}. If $z=1$ then we have
to look at the data to decide if a candidate value is acceptable or not.
The strategy is to use a ``random walk'' proposal distribution for
each $s$.
Candidates are drawn by perturbing the current value locally. Specifically,
we start off with a collection of neighbors for point s, say ${\cal N}(s)$ which are
enumerated before the MCMC iterations are begun. The size of this set can be arbtirary.
Smaller neighborhoods require fewer calculations but mix more slowly.
So we draw our candidate from that set. Then the actual full conditionals are evaluated.....
L(y|s^{*})
and compare to L(y|s)
Note for a small enough state-space grid then we could draw directly
from the multinomial full conditional distribution but I think
the calculation is too expensive for a fine state-space grid.
3 cases require attention
(1) The $n$ observed individuals
(2) The $n_{0}$ individuals for which $z=1$ but $y=0$. They all have
the same full-conditional distribution
(3) Individuals for which $z=0$ are drawn from the prior distribution.
\subsection{Covariates on Density}
We can allow easily for covariates to influence ${\bf s}$.
In particular, under the discrete state-space formulation, in that case then
${\bf s}_{i}$ is a multinomial trial.........
In the context of data augmentation there is a slight blip. The intercept is not identifiable.
Because that information is contained in $\psi$ See Royle et al. (in review)......
When we have a covariate that affects density, we just modify the full
conditional by inserting the multinomial prior distribution:
\[
\pi({\bf s}) = \frac{ exp( \beta z({\bf s}) ) }
\frac{ \sum_{s} exp( \beta z({\bf s}) ) }
\]
In this case, we have to update the parameter $\beta$ which we do
using a Metropolis-Hastings step with a random-walk proposal.
\section{Goodness-of-Fit using a Bayesian p-value}
The current version of the R code (SRCe.fn) has a GoF evaluation built
into it.
\end{document}