Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSA residuals #1

Open
catarinawor opened this issue Mar 15, 2024 · 6 comments
Open

OSA residuals #1

catarinawor opened this issue Mar 15, 2024 · 6 comments

Comments

@catarinawor
Copy link
Contributor

catarinawor commented Mar 15, 2024

Discussion on OSA residuals
Here are a couple more links on OSA residuals:
Slides by Cole Monnahan on using OSA Residuals in stock assessment: Slide 1 (npfmc.org)
An R package for calculating OSA Residuals for compositions: https://github.com/fishfollower/compResidual

Papers:
Trijoulet et al 2023
Li et al. 2024
Thygesen et al. 2017

@paul-vdb
Copy link
Collaborator

I added the OSA example I wrote to the repo here. https://github.com/pbs-assess/renewassess/blob/main/code/OSA/OSA_Multinomial.R

Feel free to edit it please if it's wrong!

@seananderson
Copy link
Member

At nearly the same time(!) I added some notes in https://github.com/pbs-assess/renewassess/blob/main/osa-notes.Rmd on what's happening within the TMB machinery to calculate OSA oneStepGeneric residuals. It's pretty crazy to wrap your head around... data become parameters and you profile over them.

@paul-vdb
Copy link
Collaborator

@seananderson I need to digest that a bit.
How does this sound for understanding for some wavy hands...
For an x[i] value, remove the data x[(i+1):n] data from the likelihood contribution, evaluate likelihood along the domain of -Inf, x[i]. Approximate that integral via a spline? Or what their methods are for profile likelihoods? Then straightforward to transform CDF to standard normal scale and that's the residual.

@seananderson
Copy link
Member

Well that was an enlightening exercise... I got your R script @paul-vdb to match the TMB OSA residuals exactly here. Partly I did the calculations matching TMB so I could check my calculations as I went and that solidified some of my understanding about how the observation likelihoods accumulate as you step through the data. I also edited your original script to vectorize the runif() call and set the seed to rule that out as the difference. At some point it would be good to write up that matching version more elegantly than how I left it.

@paul-vdb
Copy link
Collaborator

Thanks @seananderson

@paul-vdb
Copy link
Collaborator

paul-vdb commented Mar 18, 2024

@seananderson I dug deeper. The way I wrote the original OSA was correct. It didn't match the set.seed argument because they (RTMB) simulate the runif for the full set of observations and then subset. I subset from the outgo and so that length difference in the argument was the difference maker. Thank goodness since when I went through the maths of your OSA2 file it was exactly mine. Looking at the output from the onesteppredict function it also was calculating the same Fx and px values as I was too! So in short, original function is technically correct for the multinomial with no other complexity.

From Equation 8 in Thygesen,
$f_i(y) = \int f_{\bar{x}y_1^i}(y_{i-1}, y)$

What makes life easy for the multinomial, is that when you write it as conditional binomials, then they are independent and as a result, $f(y_{i-1}, y) = f(y_{i-1}) \times f(y)$. Thus all those nll_up_to_obs terms cancel out and the equation is simply,
$F_x = F(y_i) = pbinom(y_i, n_i, p_i)$
and
$p_x = f(y_i)$

Their way of calculating z,
u = runif()
z = qnorm(Fx - u*px) is equivalent to how I had it with,
z = qnorm(runif(1, F(x-1), Fx))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants