-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Refactor LatentGP API slightly #216
base: master
Are you sure you want to change the base?
Conversation
@@ -3,10 +3,9 @@ | |||
x = rand(10) | |||
y = rand(10) | |||
|
|||
lgp = LatentGP(gp, x -> MvNormal(x, 0.1), 1e-5) | |||
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This constructor of MvNormal
was deprecated (JuliaStats/Distributions.jl#1362), it should be replaced with e.g.
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.1)), 1e-5) | |
lgp = LatentGP(gp, x -> (f -> MvNormal(f, 0.01 * I)), 1e-5) |
A Haskell coder would say that's just how functions work 😂 x -> f -> lik But I do agree it's a bit cumbersome here. What was your motivation for doing it this way? What do you think of typing up some concrete use-cases? (not the full implementation, just how the api usage might look) I'm assuming you are thinking of some likelihood where location = gp(x) and scale = deterministic(x). How would we then handle location = gp1(x) and scale = gp2(x)? |
What would stop us from simply passing the inputs x directly to the likelihood object when calling it? The LatentFiniteGP already carries x with it, so we could extract it from there in Edited to add: We could also define a trait for "wants to be passed the inputs" so all the likelihoods that don't actually care about |
It just feels really clean / natural to mirror the
I guess this relates to the above. This would also be a perfectly reasonable way to do this. Whichever we went with, I think I would rather be consistent. Either we have struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
lgp::Tlgp
x::Tx
end and build everything on the fly (which is totally fine, it's all essentially zero-overhead), or we do struct LatentFiniteGP{Tfx<:LatentGP, Tlikx}
fx::Tfx
likx::Tlikx
end in which both I can't say that I'm overly fussed either way. Do you have a preference @st-- ?
I've not thought especially about how users will find using this -- we can always build utilities on top of the API provided that it supports everything that we need. In terms of low-level usage, I was imagining something like (as a heteroscedastic example): f = GP(Matern52Kernel())
g = my_favourite_positive_valued_function
jitter = 1e-6
latent_gp = LatentGP(f, x -> (f -> MvNormal(f, map(g, x))), jitter)
elbo(approx, latent_gp(x), y) or something qualitatively like this.
You read my mind. For example f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f -> MvNormal(f(x[1:N]), map(exp, f(x[N+1:end])))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter) This second example is really quite unpleasant, so we will definitely need utilities built on top of this to make it usable in practice, but hopefully the gist is clear. The point is just that we would now have the flexibility to express something like this. (Incidentally, one of the things I like about this is that it doesn't privelidge mean-field-per-process approximate posteriors)
I like this suggestion, as I agree that it would be sad if likelihoods that don't care about inputs had to care about them. edit: the more that I think about it, the more that I like struct LatentFiniteGP{Tlgp<:LatentGP, Tx<:AbstractVector}
lgp::Tlgp
x::Tx
end as the structure of the edit2: the only other reason to like the |
Not sure if it is a good idea but maybe one could dispatch on |
That would definitely do it. Do we know whether |
@willtebbutt your two examples seem inconsistent (and that's why I wanted to see some mock code usage!): in your first example, |
You're completely correct. Good point. The second example should instead be something like f = GP(LinearMixingModelKernel(randn(2, 2), [Matern52Kernel(), Matern52Kernel()]))
lik(x::MOInputIsotopicByOutput) = (f::AbstractVector{<:Real} -> MvNormal(f[1:N], map(exp, f[N+1:end]))
jitter = 1e-6
latent_gp = LatentGP(f, lik, jitter) The point here is that the type of |
Made me wonder if perhaps we could unify the two isotopic MO inputs into a single parametric type, so that we can pass-through dispatch on the type of isotopy, e.g. lik(x::MOInputIsotopic{T}) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(T, f, 1), map(exp, get_mo_output(T, f, 2))) ? but actually for that to work you do need to know how many different outputs there are ... so maybe a lik(x::MOInputIsotopic) = (f::AbstractVector{<:Real} -> MvNormal(get_mo_output(x.type, f, 1), map(exp, get_mo_output(x.type, f, 2))) but it still seems pretty clunky. And how would it work if you want to build a heterotopic multi-output heteroskedastic model, so if there are observations y1 and y2 which you may observe at different inputs, but wherever you observe y1 you need both f1 and f2, and where you observe y2 you also need f3 and f4? 😂 |
Yeah, this is a good point. Perhaps this could be achieved by nesting multi-output models? e.g. GP(LinearMixingModelKernel([
LinearMixingModelKernel(kernels_for_mean_processes),
LinearMixingModelKernel(kernels_for_variance_processes),
])) Might need to be nested the other way around, but hopefully the idea is clear. An input for this would be something like julia> feature_1, feature_2, feature_3 = 1.0, 2.0, 3.0
(1.0, 2.0, 3.0)
julia> MOInputIsotopicByOutputs([(feature_1, 1), (feature_2, 2), (feature_3, 1)], 2)
6-element MOInputIsotopicByOutputs{Tuple{Float64, Int64}, Vector{Tuple{Float64, Int64}}}:
((1.0, 1), 1)
((2.0, 2), 1)
((3.0, 1), 1)
((1.0, 1), 2)
((2.0, 2), 2)
((3.0, 1), 2) or whatever.
This could make sense. I wonder what the tradeoffs are here vs the supertype option that we've discussed before? edit: I missed the mixing matrices from the kernels above. I was imagining they'd just be the identity matrix here, although maybe there are situations in which you want your noise variance to be proportional to the mean... |
What about having an
since input dependent likelihoods are more "rare" that would make more sense to need an extra effort to make one? |
NB- |
As discussed in #202
I don't want to merge this until we're ready to make another breaking release (I couldn't figure out how to do this without being breaking).
edit: my only reservation about this PR is that having to provide a function which maps to a function which maps to a distribution feels a bit complicated. Not sure whether there's a better approach. See the tests for an example -- it's just rather complicated for humans to interact with.