-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path92-appendixC.Rmd
executable file
·444 lines (344 loc) · 37.9 KB
/
92-appendixC.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
# Fast approximate Bayesian inference {#naomi-aghq-appendix}
```{r}
resource_version <- list.files("resources/naomi-aghq")
```
## Epilepsy example
### `TMB` C++ template {#tmb-epil}
```{cpp, echo=TRUE, eval=FALSE, output.var="epil", code=readLines("figures/naomi-aghq/TMB/epil.cpp")}
```
### Modified `TMB` C++ template {#tmb-modified-epil}
```{cpp, echo=TRUE, eval=FALSE, output.var="epil", code=readLines("figures/naomi-aghq/TMB/epil_modified.cpp")}
```
### `Stan` C++ template {#stan-epil}
```{cpp, echo=TRUE, eval=FALSE, output.var="epil", code=readLines("figures/naomi-aghq/epil.stan")}
```
### NUTS convergence and suitability
#### `tmbstan` {#tmbstan-epil}
(ref:tmbstan-epil) Traceplots for the `tmbstan` parameters with the lowest ESS and highest potential scale reduction factor. These were `l_tau_nu` (an $\text{ESS}$ of 377) and `beta[3]` (an $\hat R$ of 1.006).
```{r tmbstan-epil, fig.cap="(ref:tmbstan-epil)"}
knitr::include_graphics("figures/naomi-aghq/tmbstan-epil.png")
```
#### `rstan` {#rstan-epil}
(ref:stan-epil) Traceplots for the `rstan` parameters with the lowest ESS and highest potential scale reduction factor. These were `tau_nu` (an $\text{ESS}$ of 437) and `tau_nu` (an $\hat R$ of 1.009). Rather than plotting the traceplot for `tau_nu` twice, the parameter `epsilon[18]` is included, which had the second highest $\hat R$ of 1.008.
```{r stan-epil, fig.cap="(ref:stan-epil)"}
knitr::include_graphics("figures/naomi-aghq/stan-epil.png")
```
\newpage
## Loa loa example
### NUTS convergence and suitability
(ref:nuts-loa-loa) Traceplots for the parameters with the lowest ESS and highest potential scale reduction factor for the Loa loa ELGM example.
```{r nuts-loa-loa, fig.cap="(ref:nuts-loa-loa)"}
knitr::include_graphics("figures/naomi-aghq/nuts-loa-loa.png")
```
### Inference comparison
(ref:loa-loa-mean-sd-pct) Relative difference between the Gaussian and Laplace marginal posterior means and standard deviations to NUTS results at each $u(s_i), v(s_i): i \in [190]$. Absolute differences are in Figure \@ref(fig:loa-loa-mean-sd-abs).
```{r loa-loa-mean-sd-pct, fig.cap="(ref:loa-loa-mean-sd-pct)"}
knitr::include_graphics("figures/naomi-aghq/loa-loa-mean-sd-pct.png")
```
\newpage
## AGHQ with Laplace marginals algorithm {#algorithm}
This section provides the INLA-like algorithm for AGHQ with Laplace marginals used in this thesis.
The algorithm for AGHQ with Gaussian marginals used in this thesis is as given in @stringer2022fast, and implemented in the `aghq` package.
1. Calculate the mode, Hessian at the mode, lower Cholesky, and Laplace approximation
\begin{align}
\hat{\boldsymbol{\mathbf{\theta}}} &= \arg \max_{\boldsymbol{\mathbf{\theta}}} {\tilde p_\texttt{LA}(\boldsymbol{\mathbf{\theta}}, \mathbf{y})}, \\
\hat{\mathbf{H}} &= - \frac{\partial^2}{\partial \boldsymbol{\mathbf{\theta}} \partial \boldsymbol{\mathbf{\theta}}^\top} \log \tilde p_\texttt{LA}(\boldsymbol{\mathbf{\theta}}, \mathbf{y}) \rvert_{\boldsymbol{\mathbf{\theta}} = \hat{\boldsymbol{\mathbf{\theta}}}}, \\
\hat{\mathbf{H}}^{-1} &= \hat{\mathbf{L}} \hat{\mathbf{L}}^\top, \\
\tilde p_\texttt{LA}(\boldsymbol{\mathbf{\theta}}, \mathbf{y}) &= \frac{p(\mathbf{y}, \mathbf{x}, \boldsymbol{\mathbf{\theta}})}{\tilde p_\texttt{G}(\mathbf{x} \, | \, \boldsymbol{\mathbf{\theta}}, \mathbf{y})} \Big\rvert_{\mathbf{x} = \hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}})},
\end{align}
where $\tilde p_\texttt{G}(\mathbf{x} \, | \, \boldsymbol{\mathbf{\theta}}, \mathbf{y}) = \mathcal{N}(\mathbf{x} \, | \, \hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}}), \hat{\mathbf{H}}(\boldsymbol{\mathbf{\theta}})^{-1})$ is a Gaussian approximation to $p(\mathbf{x} \, | \, \boldsymbol{\mathbf{\theta}}, \mathbf{y})$ with mode and precision matrix given by
\begin{align}
\hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}}) &= \arg \max_\mathbf{x} \log p(\mathbf{y}, \mathbf{x}, \boldsymbol{\mathbf{\theta}}), \\
\hat{\mathbf{H}}(\boldsymbol{\mathbf{\theta}}) &= - \frac{\partial^2}{\partial \mathbf{x} \partial \mathbf{x}^\top} \log p(\mathbf{y}, \mathbf{x}, \boldsymbol{\mathbf{\theta}}) \rvert_{\mathbf{x} = \hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}})}.
\end{align}
2. Generate a set of nodes $\mathbf{u} \in \mathcal{Q}(m, k)$ and weights $\omega: \mathbf{u} \to \mathbb{R}$ from a Gauss-Hermite quadrature rule with $k$ nodes per dimension.
Adapt these nodes based on the mode and lower Cholesky via $\boldsymbol{\mathbf{\theta}}(\mathbf{u}) = \hat{\boldsymbol{\mathbf{\theta}}} + \mathbf{L} \mathbf{u}$.
Use this quadrature rule to calculate the normalising constant $\tilde p_{\texttt{AQ}}(\mathbf{y})$ as follows
\begin{equation}
\tilde p_{\texttt{AQ}}(\mathbf{y}) = \sum_{\mathbf{u} \in \mathcal{Q}(m, k)} \tilde p_\texttt{LA}(\boldsymbol{\mathbf{\theta}}(\mathbf{u}), \mathbf{y}) \omega(\mathbf{u}). (\#eq:evidence-appendix)
\end{equation}
3. For $i \in [N]$ generate $l$ nodes $x_i(\mathbf{v})$ via a Gauss-Hermite quadrature rule $\mathbf{v} \in \mathcal{Q}(1, l)$ adapted based on the mode $\hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}})_i$ and standard deviation $\sqrt{\text{diag}[\hat{\mathbf{H}}(\boldsymbol{\mathbf{\theta}})^{-1}]_i}$ of the Gaussian marginal.
A value of $l \geq 4$ is recommended to enable B-spline interpolation.
For $x_i \in \{ x_i(\mathbf{v}) \}_{\mathbf{v} \in \mathcal{Q}(1, l)}$ and $\boldsymbol{\mathbf{\theta}} \in \{ \boldsymbol{\mathbf{\theta}}(\mathbf{u}) \}_{\mathbf{u} \in \mathcal{Q}(m, k)}$ calculate the modes and Hessians
\begin{align}
\hat{\mathbf{x}}_{-i}(x_i, \boldsymbol{\mathbf{\theta}}) &= \arg \max_{\mathbf{x}_{-i}} \log p(\mathbf{y}, x_i, \mathbf{x}_{-i}, \boldsymbol{\mathbf{\theta}}), \\
\hat{\mathbf{H}}_{-i, -i}(x_i, \boldsymbol{\mathbf{\theta}}) &= - \frac{\partial^2}{\partial \mathbf{x}_{-i} \partial \mathbf{x}_{-i}^\top} \log p(\mathbf{y}, x_i, \mathbf{x}_{-i}, \boldsymbol{\mathbf{\theta}}) \rvert_{\mathbf{x}_{-i} = \hat{\mathbf{x}}_{-i}(x_i, \boldsymbol{\mathbf{\theta}})},
\end{align}
where optimisation to obtain $\hat{\mathbf{x}}_{-i}(x_i, \boldsymbol{\mathbf{\theta}})$ can be initialised at $\hat{\mathbf{x}}(\boldsymbol{\mathbf{\theta}})_{-i}$.
4. For $x_i \in \{ x_i(\mathbf{v}) \}_{\mathbf{v} \in \mathcal{Q}(1, l)}$ calculate
\begin{equation}
p_\texttt{AQ}(x_i \, | \, \mathbf{y}) = \frac{\tilde p_\texttt{LA}(x_i, \mathbf{y})}{\tilde p_{\texttt{AQ}}(\mathbf{y})}, (\#eq:laplace-marginal-appendix)
\end{equation}
where
\begin{equation}
\tilde p_\texttt{LA}(x_i, \mathbf{y}) = \sum_{\mathbf{u} \in \mathcal{Q}(m, k)} \tilde p_\texttt{LA}(x_i, \boldsymbol{\mathbf{\theta}}(\mathbf{u}), \mathbf{y}) \omega(\mathbf{u}).
\end{equation}
and
\begin{equation}
\tilde p_\texttt{LA}(x_i, \boldsymbol{\mathbf{\theta}}, \mathbf{y}) = \frac{p(x_i, \mathbf{x}_{-i}, \boldsymbol{\mathbf{\theta}}, \mathbf{y})}{\tilde p_\texttt{G}(\mathbf{x}_{-i} \, | \, x_i, \boldsymbol{\mathbf{\theta}}, \mathbf{y})} \Big\rvert_{\mathbf{x}_{-i} = \hat{\mathbf{x}}_{-i}(x_i, \boldsymbol{\mathbf{\theta}})}.
\end{equation}
Equation \@ref(eq:laplace-marginal-appendix) can be calculated using the estimate of the evidence given in Equation \@ref(eq:evidence-appendix), but it is more numerically accurate, and requires little extra computation, to use the estimate
\begin{equation}
\tilde p_{\texttt{AQ}}(\mathbf{y}) = \sum_{\mathbf{v} \in \mathcal{Q}(1, l)} \tilde p_\texttt{LA}(x_i(\mathbf{v}), \mathbf{y}) \omega(\mathbf{v})
\end{equation}
5. Given $\{x_i(\mathbf{v}), \tilde p_\texttt{AQ}(x_i(\mathbf{v}) \, | \, \mathbf{y})\}_{\mathbf{v} \in \mathcal{Q}(1, l)}$ create a spline interpolant to each posterior marginal on the log-scale.
Samples, and thereby relevant posterior marginal summaries, may be obtained using inverse transform sampling.
## Simplified Naomi model description {#naomi-math}
This section describes the simplified version of the Naomi model [@eaton2021naomi] in more detail.
The concise $i$ indexing used in Section \@ref(naomi) is replaced by a more complete $x, s, a$ indexing.
There are four sections:
1. Section \@ref(naomi-process) gives the process specifications, giving the terms in each structured additive predictor, along with their distributions.
2. Section \@ref(naomi-likelihood) gives additional details about the likelihood terms not provided in Section \@ref(naomi).
3. Section \@ref(naomi-identifiability) gives identifiability constraints used in circumstances where incomplete data is available for the country.
4. Section \@ref(naomi-implementation) provides details of the `TMB` implementation.
### Process specification {#naomi-process}
| | Model component | Latent field | Hyperparameter |
| ----------------- | ------------------ | ------------ | -------------- |
| Section \@ref(hiv-prev) | HIV prevalence | $22 + 5n$ | 9 |
| Section \@ref(art-cov) | ART coverage | $25 + 5n$ | 9 |
| Section \@ref(hiv-inc) | HIV incidence rate | $2 + n$ | 3 |
| Section \@ref(anc-test) | ANC testing | $2 + 2n$ | 2 |
| Section \@ref(art-attend) | ART attendance | $n$ | 1 |
| | Total | $51 + 14n$ | 24 |
Table: (\#tab:process) The Naomi model can be conceptualised as having five processes. This table gives the number of latent field parameters and hyperparameters in each process, where $n$ is the number of districts in the country.
#### HIV prevalence {#hiv-prev}
HIV prevalence $\rho_{x, s, a} \in [0, 1]$ was modelled on the logit scale using the structured additive predictor
\begin{equation}
\text{logit}(\rho_{x, s, a}) = \beta^\rho_0 + \beta_{S}^{\rho, s = \text{M}} + \mathbf{u}^\rho_a + \mathbf{u}_a^{\rho, s = \text{M}} + \mathbf{u}^\rho_x + \mathbf{u}_x^{\rho, s = \text{M}} + \mathbf{u}_x^{\rho, a < 15} + \boldsymbol{\mathbf{\eta}}^\rho_{R_x, s, a}. (\#eq:prev)
\end{equation}
Table \@ref(tab:prev) provides a description of the terms included in Equation \@ref(eq:prev).
Independent half-normal prior distributions were chosen for the five standard deviation terms
\begin{equation}
\{\sigma_A^\rho, \sigma_{AS}^\rho, \sigma_X^\rho, \sigma_{XS}^\rho, \sigma_{XA}^\rho\} \sim \mathcal{N}^{+}(0, 2.5),
\end{equation}
independent uniform prior distributions for the two AR1 correlation parameters
\begin{equation}
\{\phi_A^\rho, \phi_{AS}^\rho\} \sim \mathcal{U}(-1, 1),
\end{equation}
and independent beta prior distributions for the two BYM2 proportion parameters
\begin{equation}
\{\phi_X^\rho, \phi_{XS}^\rho\} \sim \text{Beta}(0.5, 0.5).
\end{equation}
| Term | Distribution | Description |
|----------------------------------|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|
| $\beta^\rho_0$ | $\mathcal{N}(0, 5)$ | Intercept |
| $\beta_{s}^{\rho, s = \text{M}}$ | $\mathcal{N}(0, 5)$ | The difference in logit prevalence for men compared to women |
| $\mathbf{u}^\rho_a$ | $\text{AR}1(\sigma_A^\rho, \phi_A^\rho)$ | Age random effects for women |
| $\mathbf{u}_a^{\rho, s = \text{M}}$ | $\text{AR}1(\sigma_{AS}^\rho, \phi_{AS}^\rho)$ | Age random effects for the difference in logit prevalence for men compared to women age $a$ |
| $\mathbf{u}^\rho_x$ | $\text{BYM}2(\sigma_X^\rho, \phi_X^\rho)$ | Spatial random effects for women |
| $\mathbf{u}_x^{\rho, s = \text{M}}$ | $\text{BYM}2(\sigma_{XS}^\rho, \phi_{XS}^\rho)$ | Spatial random effects for the difference in logit prevalence for men compared to women in district $x$ |
| $\mathbf{u}_x^{\rho, a < 15}$ | $\text{ICAR}(\sigma_{XA}^\rho)$ | Spatial random effects for the difference in logit paediatric prevalence to adult women prevalence in district $x$ |
| $\boldsymbol{\mathbf{\eta}}^\rho_{R_x, s, a}$ | $-$ | Fixed offsets specifying assumed odds ratios for prevalence outside the age ranges for which data were available. Calculated from Spectrum model [@stover2019updates] outputs for region $R_x$ |
Table: (\#tab:prev) Each term in Equation \@ref(eq:prev) together with, where applicable, its prior distribution and a written description of its role.
#### ART coverage {#art-cov}
ART coverage $\alpha_{x, s, a} \in [0, 1]$ was modelled on the logit scale using the structured additive predictor
\begin{equation}
\text{logit}(\alpha_{x, s, a}) = \beta^\alpha_0 + \beta_{S}^{\alpha, s = \text{M}} + \mathbf{u}^\alpha_a + \mathbf{u}_a^{\alpha, s = \text{M}} + \mathbf{u}^\alpha_x + \mathbf{u}_x^{\alpha, s = \text{M}} + \mathbf{u}_x^{\alpha, a < 15} + \boldsymbol{\mathbf{\eta}}^\alpha_{R_x, s, a}
\end{equation}
with terms and prior distributions analogous to the HIV prevalence process model in Section \@ref(hiv-prev) above.
#### HIV incidence rate {#hiv-inc}
HIV incidence rate $\lambda_{x, s, a} > 0$ was modelled on the log scale using the structured additive predictor
\begin{equation}
\log(\lambda_{x, s, a}) = \beta_0^\lambda + \beta_S^{\lambda, s = \text{M}} + \log(\rho_{x}^{\text{15-49}}) + \log(1 - \omega \cdot \alpha_{x}^{\text{15-49}}) + \mathbf{u}_x^\lambda + \boldsymbol{\mathbf{\eta}}_{R_x, s, a}^\lambda. (\#eq:inc)
\end{equation}
Table \@ref(tab:inc) provides a description of the terms included in Equation \@ref(eq:inc).
| Term | Distribution | Description |
|-----------------------------------|----------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $\beta^\lambda_0$ | $\mathcal{N}(0, 5)$ | Intercept term proportional to the average HIV transmission rate for untreated HIV positive adults |
| $\beta_S^{\lambda, s = \text{M}}$ | $\mathcal{N}(0, 5)$ | The log incidence rate ratio for men compared to women |
| $\rho_{x}^{\text{15-49}}$ | $-$ | The HIV prevalence among adults 15-49 in district $x$ calculated by aggregating age-specific HIV prevalences |
| $\alpha_{x}^{\text{15-49}}$ | $-$ | The ART coverage among adults 15-49 in district $x$ calculated by aggregating age-specific ART coverages |
| $\omega = 0.7$ | $-$ | Average reduction in HIV transmission rate per increase in population ART coverage fixed based on inputs to the Estimation and Projection Package (EPP) model |
| $\mathbf{u}_x^\lambda$ | $\mathcal{N}(0, \sigma^\lambda)$ | IID spatial random effects with $\sigma^\lambda \sim \mathcal{N}^+(0, 1)$ |
| $\boldsymbol{\mathbf{\eta}}^\lambda_{R_x, s, a}$ | $-$ | Fixed log incidence rate ratios by sex and age group calculated from Spectrum model outputs for region $R_x$ |
Table: (\#tab:inc) Each term in Equation \@ref(eq:inc) together with, where applicable, its prior distribution and a written description of its role.
The proportion recently infected among HIV positive persons $\kappa_{x, s, a} \in [0, 1]$ was modelled as
\begin{equation}
\kappa_{x, s, a} = 1 - \exp \left(- \lambda_{x, s, a} \cdot \frac{1 - \rho_{x, s, a}}{\rho_{x, s, a}} \cdot (\Omega_T - \beta_T ) - \beta_T \right),
\end{equation}
where $\Omega_T \sim \mathcal{N}(\Omega_{T_0}, \sigma^{\Omega_T})$ is the mean duration of recent infection, and $\beta_T \sim \mathcal{N}^{+}(\beta_{T_0}, \sigma^{\beta_T})$ is the false recent ratio.
The prior distribution for $\Omega_T$ was informed by the characteristics of the recent infection testing algorithm.
For PHIA surveys this was $\Omega_{T_0} = 130 \text{ days}$ and $\sigma^{\Omega_T} = 6.12 \text{ days}$.
For PHIA surveys there was assumed to be no false recency, such that $\beta_{T_0} = 0.0$, $\sigma^{\beta_T} = 0.0$, and $\beta_T = 0$.
#### ANC testing {#anc-test}
HIV prevalence $\rho_{x, a}^\text{ANC}$ and ART coverage $\alpha_{x, a}^\text{ANC}$ among pregnant women were modelled as being offset on the logit scale from the corresponding district-age indicators $\rho_{x, F, a}$ and $\alpha_{x, F, a}$ according to
\begin{align}
\text{logit}(\rho_{x, a}^{\text{ANC}}) &= \text{logit}(\rho_{x, F, a}) + \beta^{\rho^{\text{ANC}}} + \mathbf{u}_x^{\rho^{\text{ANC}}} + \boldsymbol{\mathbf{\eta}}_{R_x, a}^{\rho^{\text{ANC}}}, (\#eq:anc1) \\
\text{logit}(\alpha_{x, a}^{\text{ANC}}) &= \text{logit}(\alpha_{x, F, a}) + \beta^{\alpha^{\text{ANC}}} + \mathbf{u}_x^{\alpha^{\text{ANC}}} + \boldsymbol{\mathbf{\eta}}_{R_x, a}^{\alpha^{\text{ANC}}} (\#eq:anc2).
\end{align}
Table \@ref(tab:anc) provides a description of the terms included in Equation \@ref(eq:anc1) and Equation \@ref(eq:anc2).
| Term | Distribution | Description |
|-----------------------------------------|--------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $\beta^{\theta^{\text{ANC}}}$ | $\mathcal{N}(0, 5)$ | Intercept giving the average difference between population and ANC outcomes |
| $\mathbf{u}_x^{\theta^{\text{ANC}}}$ | $\mathcal{N}(0, \sigma_X^{\theta^{\text{ANC}}})$ | IID district random effects with $\sigma_X^{\theta^{\text{ANC}}} \sim \mathcal{N}^+(0, 1)$ |
| $\boldsymbol{\mathbf{\eta}}_{R_x, a}^{\theta^{\text{ANC}}}$ | $-$ | Offsets for the log fertility rate ratios for HIV positive women compared to HIV negative women and for women on ART to HIV positive women not on ART, calculated from Spectrum model outputs for region $R_x$ |
Table: (\#tab:anc) Each term in Equations \@ref(eq:anc1) and \@ref(eq:anc2) together with (where applicable) its prior distribution and a written description of its role. The notation $\theta$ is used as stand in for $\theta \in \{\rho, \alpha\}$.
In the full Naomi model, for adult women 15-49 the number of ANC clients $\Psi_{x, a} > 0$ were modelled as
\begin{equation}
\log (\Psi_{x, a}) = \log (N_{x, \text{F}, a}) + \psi_{R_x, a} + \beta^\psi + \mathbf{u}_x^\psi,
\end{equation}
where $N_{x, \text{F}, a}$ are the female population sizes, $\psi_{R_x, a}$ are fixed age-sex fertility ratios in Spectrum region $R_x$, $\beta^\psi$ are log rate ratios for the number of ANC clients relative to the predicted fertility, and $\mathbf{u}_x^\psi \sim \mathcal{N}(0, \sigma^\psi)$ are district random effects.
Here these terms are fixed to $\beta^\psi = 0$ and $\mathbf{u}_x^\psi = \mathbf{0}$ such that $\Psi_{x, a}$ are simply constants.
#### ART attendance {#art-attend}
```{r}
log_odds <- function(p) log(p / (1 - p))
logistic <- function(x) 1 / (1 + exp(-x))
# 100 * round(logistic(-4), 3)
```
Let $\gamma_{x, x'} \in [0, 1]$ be the probability that a person on ART residing in district $x$ receives ART in district $x'$.
Assume that $\gamma_{x, x'} = 0$ for $x \notin \{x, \text{ne}(x)\}$ such that individuals seek treatment only in their residing district or its neighbours $\text{ne}(x) = \{x': x' \sim x\}$, where $\sim$ is an adjacency relation, and $\sum_{x' \in \{x, \text{ne}(x)\}} \gamma_{x, x'} = 1$.
The probabilities $\gamma_{x, x'}$ for $x \sim x'$ were modelled using multinomial logistic regression model, based on the log-odds ratios
\begin{equation}
\tilde \gamma_{x, x'} = \log \left( \frac{\gamma_{x, x'}}{1 - \gamma_{x, x'}} \right) = \tilde \gamma_0 + \mathbf{u}_x^{\tilde \gamma}. (\#eq:log-or)
\end{equation}
Table \@ref(tab:log-or) provides a description of the terms included in Equation \@ref(eq:log-or).
Fixing $\tilde \gamma_{x, x} = 0$ then the multinomial probabilities may be recovered using the softmax
\begin{equation}
\gamma_{x, x'} = \frac{\exp(\tilde \gamma_{x, x'})}{\sum_{x^\star \in \{x, \text{ne}(x)\}} \exp(\tilde \gamma_{x, x^\star})}.
\end{equation}
| Term | Distribution | Description |
|--------------------------------|--------------------------------------------|-------------------------------------------------------------------------------------|
| $\tilde \gamma_0$ | $-$ | Fixed intercept $\tilde \gamma_0 = -4$. Implies a prior mean on $\gamma_{x, x'}$ of 1.8%, such that a-priori $(100 - 1.8 \times \text{ne}(x))\%$ of ART clients in district $x$ obtain treatment in their home district |
| $\mathbf{u}_x^{\tilde \gamma}$ | $\mathcal{N}(0, \sigma_X^{\tilde \gamma})$ | District random effects, with $\sigma_X^{\tilde \gamma} \sim \mathcal{N}^+(0, 2.5)$ |
Table: (\#tab:log-or) Each term in Equation \@ref(eq:log-or) together with, where applicable, its prior distribution and a written description of its role.
As no terms include $x'$, $\gamma_{x, x'}$ is only a function of $x$.
### Additional likelihood specification {#naomi-likelihood}
Though Section \@ref(naomi) provides a complete description of Naomi's likelihood specification, any additional useful details are provided here.
#### Household survey data
The generalised binomial $y \sim \text{xBin}(m, p)$ is defined for $y, m \in \mathbb{R}^+$ with $y \leq m$ such that
\begin{align}
\log p(y) = &\log \Gamma(m + 1) - \log \Gamma(y + 1) \\
&- \log \Gamma(m - y + 1) + y \log p + (m - y) \log(1 - p),
\end{align}
where the gamma function $\Gamma$ is such that $\forall n \in \mathbb{N}$, $\Gamma(n) = (n - 1)!$.
### Identifiability constraints {#naomi-identifiability}
If data are missing, some parameters are fixed to default values to help with identifiability.
In particular:
1. If survey data on HIV prevalence or ART coverage by age and sex are not available then $\mathbf{u}_a^\theta = 0$ and $\mathbf{u}_{a, s = \text{M}}^\theta = 0$.
In this case, the average age-sex pattern from the Spectrum is used.
For the Malawi case-study (Section \@ref(malawi)), HIV prevalence and ART coverage data are not available for those aged 65+.
As a result, there are $|\{\text{0-4}, \ldots, \text{50-54}\}| = 13$ age groups included for the age random effects.
2. If no ART data, either survey or ART programme, are available but data on ART coverage among ANC clients are available, the level of ART coverage is not identifiable, but spatial variation is identifiable.
In this instance, overall ART coverage is determined by the Spectrum offset, and only area random effects are estimated such that
\begin{equation}
\text{logit} \left(\alpha_{x, s, a} \right) = \mathbf{u}_x^\alpha + \boldsymbol{\mathbf{\eta}}_{R_x, s, a}^\alpha.
\end{equation}
3. If survey data on recent HIV infection are not included in the model, then $\beta_0^\lambda = \beta_S^{\lambda, s = \text{M}} = 0$ and $\mathbf{u}_x^\lambda = \mathbf{0}$.
The sex ratio for HIV incidence is determined by the sex incidence rate ratio from Spectrum, and the incidence rate in all districts is modelled assuming the same average HIV transmission rate for untreated adults, but varies according to district-level estimates of HIV prevalence and ART coverage.
### Implementation {#naomi-implementation}
The `TMB` C++ code for the negative log-posterior of the simplified Naomi model is available from [`https://github.com/athowes/naomi-aghq`](https://github.com/athowes/naomi-aghq).
For ease of understanding, Table \@ref(tab:tmb-lookup) provides correspondence between the mathematical notation used in Section \@ref(naomi-math) and the variable names used in the `TMB` code, for all hyperparameters and latent field parameters.
For further reference on the `TMB` software see @tmbdocumentation.
| Variable name | Notation | Type | Domain | $\rho$ | $\alpha$ | $\lambda$ |
|----------------------------|-------------|--------|---------|--------|----------|-----------|
| `logit_phi_rho_x` | $\text{logit}(\phi_X^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `log_sigma_rho_x` | $\log(\sigma_X^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `logit_phi_rho_xs` | $\text{logit}(\phi_{XS}^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `log_sigma_rho_xs` | $\log(\sigma_{XS}^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `logit_phi_rho_a` | $\text{logit}(\phi_A^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `log_sigma_rho_a` | $\log(\sigma_A^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `logit_phi_rho_as` | $\text{logit}(\phi_{AS}^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `log_sigma_rho_as` | $\log(\sigma_{AS}^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `log_sigma_rho_xa` | $\log(\sigma_{XA}^\rho)$ | Hyper | $\mathbb{R}$ | Yes | | |
| `logit_phi_alpha_x` | $\text{logit}(\phi_X^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_alpha_x` | $\log(\sigma_X^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `logit_phi_alpha_xs` | $\text{logit}(\phi_{XS}^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_alpha_xs` | $\log(\sigma_{XS}^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `logit_phi_alpha_a` | $\text{logit}(\phi_A^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_alpha_a` | $\log(\sigma_A^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `logit_phi_alpha_as` | $\text{logit}(\phi_{AS}^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_alpha_as` | $\log(\sigma_{AS}^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_alpha_xa` | $\log(\sigma_{XA}^\alpha)$ | Hyper | $\mathbb{R}$ | | Yes | |
| `OmegaT_raw` | $\Omega_T$ | Hyper | $\mathbb{R}$ | | | Yes |
| `log_betaT` | $\log(\beta_T)$ | Hyper | $\mathbb{R}$ | | | Yes |
| `log_sigma_lambda_x` | $\log(\sigma^\lambda)$ | Hyper | $\mathbb{R}$ | | | Yes |
| `log_sigma_ancrho_x` | $\log(\sigma_X^{\rho^{\text{ANC}}})$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_ancalpha_x` | $\log(\sigma_X^{\alpha^{\text{ANC}}})$ | Hyper | $\mathbb{R}$ | | Yes | |
| `log_sigma_or_gamma` | $\log(\sigma_X^{\tilde \gamma})$ | Hyper | $\mathbb{R}$ | | | |
| `beta_rho` | $(\beta^\rho_0, \beta_{s}^{\rho, s = \text{M}})$ | Latent | $\mathbb{R}^2$ | Yes | | |
| `beta_alpha` | $(\beta^\alpha_0, \beta_{S}^{\alpha, s = \text{M}})$ | Latent | $\mathbb{R}^2$ | | Yes | |
| `beta_lambda` | $(\beta_0^\lambda, \beta_S^{\lambda, s = \text{M}})$ | Latent | $\mathbb{R}^2$ | | | Yes |
| `beta_anc_rho` | $\beta^{\rho^{\text{ANC}}}$ | Latent | $\mathbb{R}$ | | Yes | |
| `beta_anc_alpha` | $\beta^{\alpha^{\text{ANC}}}$ | Latent | $\mathbb{R}$ | | Yes | |
| `u_rho_x` | $\mathbf{w}^\rho_x$ | Latent | $\mathbb{R}^{n}$ | Yes | | |
| `us_rho_x` | $\mathbf{v}^\rho_x$ | Latent | $\mathbb{R}^{n}$ | Yes | | |
| `u_rho_xs` | $\mathbf{w}_x^{\rho, s = \text{M}}$ | Latent | $\mathbb{R}^{n}$ | Yes | | |
| `us_rho_xs` | $\mathbf{v}_x^{\rho, s = \text{M}}$ | Latent | $\mathbb{R}^{n}$ | Yes | | |
| `u_rho_a` | $\mathbf{u}^\rho_a$ | Latent | $\mathbb{R}^{10}$ | Yes | | |
| `u_rho_as` | $\mathbf{u}_a^{\rho, s = \text{M}}$ | Latent | $\mathbb{R}^{10}$ | Yes | | |
| `u_rho_xa` | $\mathbf{u}_x^{\rho, a < 15}$ | Latent | $\mathbb{R}^{n}$ | Yes | | |
| `u_alpha_x` | $\mathbf{w}^\alpha_x$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `us_alpha_x` | $\mathbf{v}^\alpha_x$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `u_alpha_xs` | $\mathbf{w}_x^{\alpha, s = \text{M}}$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `us_alpha_xs` | $\mathbf{v}_x^{\alpha, s = \text{M}}$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `u_alpha_a` | $\mathbf{u}^\alpha_a$ | Latent | $\mathbb{R}^{13}$ | | Yes | |
| `u_alpha_as` | $\mathbf{u}_a^{\alpha, s = \text{M}}$ | Latent | $\mathbb{R}^{10}$ | | Yes | |
| `u_alpha_xa` | $\mathbf{u}_x^{\alpha, a < 15}$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `ui_lambda_x` | $\mathbf{u}_x^\lambda$ | Latent | $\mathbb{R}^{n}$ | | | Yes |
| `ui_anc_rho_x` | $\mathbf{u}_x^{\rho^{\text{ANC}}}$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `ui_anc_alpha_x` | $\mathbf{u}_x^{\alpha^{\text{ANC}}}$ | Latent | $\mathbb{R}^{n}$ | | Yes | |
| `log_or_gamma` | $\mathbf{u}_x^{\tilde \gamma}$ | Latent | $\mathbb{R}^{n}$ | | |
Table: (\#tab:tmb-lookup) Correspondence between the variable name used in the Naomi `TMB` template and the mathematical notation used in Appendix \@ref(naomi-math). The parameter type, either a hyperparameter or element of the latent field, is also given. All of the parameters are defined on the real-scale in some dimension. In the final three columns ($\rho$, $\alpha$, and $\lambda$) indication is given as to which component of the model the parameter is primarily used in.
## NUTS convergence and suitability
```{r}
naomi_nuts <- readRDS(paste0("resources/naomi-aghq/", resource_version, "/depends/mcmc-out.rds"))
```
(ref:rhat) For NUTS run on the Naomi ELGM, the maximum potential scale reduction factor was `r round(naomi_nuts$rhat_max, 3)`, below the value of 1.05 typically used as a cutoff for acceptable chain mixing, indicating that the results are acceptable to use. Additionally, the vast majority (`r round(100 * naomi_nuts$rhat_above_threshold, 1)`%) of $\hat R$ values were less than 1.1.
```{r rhat, fig.cap="(ref:rhat)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/rhat.png"))
```
(ref:ess) The efficiency of the NUTS, as measured by the ratio of effective sample size to total number of iterations run, was low for most parameters (Panel A). As a result, the number of iterations required for the the effective number of samples (mean `r round(naomi_nuts$ess_mean)`) to be satisfactory was high (Panel B).
```{r ess, fig.cap="(ref:ess)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/ess.png"))
```
(ref:worst-trace) Traceplots for the parameter with the lowest ESS which was `log_sigma_alpha_xs` (an $\text{ESS}$ of `r round(naomi_nuts$ess_min)`, Panel A) and highest potential scale reduction factor which was `ui_lambda_x[10]` (an $\hat R$ of `r round(naomi_nuts$rhat_max, 3)`, Panel B).
```{r worst-trace, fig.cap="(ref:worst-trace)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/worst-trace.png"))
```
(ref:rho-a) Pairs plots for the parameters $\log(\sigma_{A}^\rho)$ and $\text{logit}(\phi_{A}^\rho)$, or `log_sigma_rho_a` and `logit_phi_rho_a` as implemented in code. These parameters are the log standard deviation and logit lag-one correlation parameter of an AR1 process. In the posterior distribution obtained with NUTS, they have a high degree of correlation.
```{r rho-a, fig.cap="(ref:rho-a)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/rho_a.png"))
```
(ref:alpha-x) Pairs plots for the parameters $\log(\sigma_X^\alpha)$ and $\text{logit}(\phi_X^\alpha)$, or `log_sigma_alpha_x` and `logit_phi_alpha_x` as implemented in code. These parameters are the log standard deviation and logit BYM2 proportion parameter of a BYM2 process. In the posterior distribution obtained with NUTS, they are close to uncorrelated.
```{r alpha-x, fig.cap="(ref:alpha-x)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/alpha_x.png"))
```
(ref:nuts-hand-comparison) Prior standard deviations were calculated by using NUTS to simulate from the prior distribution. This approach is more convenient than simulating directly from the model, but can lead to inaccuracies.
```{r nuts-hand-comparison, fig.cap="(ref:nuts-hand-comparison)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/nuts-hand-comparison.png"))
```
(ref:posterior-contraction) The posterior contraction for each parameter in the model. Values are averaged for parameters of length greater than one. The posterior contraction is zero when the prior distribution and posterior distribution have the same standard deviation. This could indicate that the data is not informative about the parameter. The closer the posterior contraction is to one, the more than the marginal posterior distribution has concentrated about a single point.
```{r posterior-contraction, fig.cap="(ref:posterior-contraction)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/posterior-contraction.png"))
```
## Use of PCA-AGHQ
```{r}
u <- runif(10000)
```
(ref:nodes-quantiles-sd) The standard deviation of the quadrature nodes can be used as a measure of coverage of the posterior marginal distribution. Nodes spaced evenly within the marginal distribution would be expected to uniformly distributed quantile, corresponding to a standard deviation of `r signif(sd(u), 4)`, shown as a dashed line.
```{r nodes-quantiles-sd, fig.cap="(ref:nodes-quantiles-sd)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/nodes-quantiles-sd.png"))
```
(ref:marginal-sd) The estimated posterior marginal standard deviation of each hyperparameter varied substantially based on its scale, either logarithmic or logistic.
```{r marginal-sd, fig.cap="(ref:marginal-sd)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/marginal-sd.png"))
```
(ref:lognormconst) The logarithm of the normalising constant estimated using PCA-AGHQ and a range of possible values of $k = 2, 3, 5$ and $s \leq 8$. Using this range of settings, there was not convergence of the logarithm of the normalising constant estimate. The time taken by GPCA-AGHQ increases exponentially with number of PCA-AGHQ dimensions kept.
```{r lognormconst, fig.cap="(ref:lognormconst)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/lognormconst.png"))
```
## Inference comparison
### Point estimates
(ref:mean-alt-output) Differences in Naomi model output posterior means as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. In all cases, error is reduced by GPCA-AGHQ, most of all for ART coverage.
```{r mean-alt-output, fig.cap="(ref:mean-alt-output)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/mean-alt-output.png"))
```
(ref:sd-alt-output) Differences in Naomi model output posterior standard deviations as estimated by GEB and GPCA-AGHQ compared to NUTS. Each point is an estimate of the indicator for a particular strata. Error is increased by GPCA-AGHQ for HIV prevalence and HIV incidence, and reduced for ART coverage.
```{r sd-alt-output, fig.cap="(ref:sd-alt-output)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/sd-alt-output.png"))
```
### Distributional quantities
(ref:ks-ess) The Kolmogorov-Smirnov (KS) test statistic for each latent field parameter is correlated with the effective sample size (ESS) from NUTS, for both GEB and GPCA-AGHQ. This may be because parameters which are harder to estimate with INLA-like methods also have posterior distributions which are more difficult to sample from. Alternatively, it may be that high KS values are caused by inaccurate NUTS estimates generated by limited effective samples.
```{r ks-ess, fig.cap="(ref:ks-ess)"}
knitr::include_graphics(paste0("resources/naomi-aghq/", resource_version, "/depends/ks-ess.png"))
```