-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce PIC approximation #492
Open
peddie
wants to merge
12
commits into
master
Choose a base branch
from
peddie/pic
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 10 commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
8090c47
Make BlockDiagonalLDLT serialiseable
peddie 31a0b21
Factor out common sparse GP stuff
peddie 0af3b22
Introduce PIC approximation
peddie f2d36a1
format fixups
peddie 78348f5
Remove debug spam
peddie 9ba34bf
Remove unused group tracker
peddie 7612fcf
serialization fixups
peddie 179dd64
missing operator== for block diagonal LDLT
peddie cdf4337
Avoid linker collision
peddie ae2f561
Remove unimplemented PIC functions
peddie 86b38ea
Cleanups and comments
peddie 9bb009e
Update Albatross to use vendored MKL distribution
peddie File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,164 @@ | ||
################################################# | ||
PIC math scratchpad | ||
################################################# | ||
|
||
.. _crappy-pic: | ||
|
||
------------------------------------------------------------------- | ||
Relationship between :math:`\sigma_\cancel{B}` and :math:`\sigma_f` | ||
------------------------------------------------------------------- | ||
|
||
It seems like what we need first of all is some way to get the PITC posterior variance based only on :math:`\cancel{B}` using only the full PITC posterior and calculations that scale only with :math:`|B|`. I'm not totally finished working out how :math:`(\sigma^2_{*})^{\cancel{B} \cancel{B}}` fits into the whole posterior :math:`(\sigma^2_*)^{PIC}` | ||
|
||
Observe that since :math:`\mathbf{\Lambda}` is a block-diagonal matrix, its inverse is too, so when it's in the middle of a quadratic form, the cross terms disappear. As far as I can tell, this is the only place we can additively separate :math:`\mathbf{Q}_{\cancel{B} \cancel{B}}` and :math:`\mathbf{K}_{B B}` without explicit cross terms. Start with the usual application of Woodbury's lemma to the PITC posterior covariance: | ||
|
||
.. math:: | ||
|
||
\newcommand{Lam}{\mathbf{\Lambda}} | ||
\newcommand{Kuu}{\mathbf{K}_{uu}} | ||
\newcommand{Kuf}{\mathbf{K}_{uf}} | ||
\newcommand{Kfu}{\mathbf{K}_{fu}} | ||
(\sigma_*^2) &= K_* - \mathbf{Q}_{* f} \left(\mathbf{K}_{fu} \mathbf{K}_{uu}^{-1} \mathbf{K}_{uf} + \mathbf{\Lambda} \right)^{-1} \mathbf{Q}_{f *} \\ | ||
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \Kuf \left(\Kfu \Kuu^{-1} \Kuf + \Lam\right)^{-1} \Kfu \Kuu^{-1} \mathbf{K}_{u*} \\ | ||
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \Kuf \Lam^{-1} \Kfu \right)^{-1} \mathbf{K}_{u*} | ||
|
||
Now break up :math:`\Kuf` and :math:`\Lam`: | ||
|
||
.. math:: | ||
|
||
\Kuf &= \begin{bmatrix} \mathbf{K}_{u \cancel{B}} & \mathbf{K}_{u B} \end{bmatrix} \\ | ||
\Lam &= \begin{bmatrix} \Lam_{\cancel{B}} & \mathbf{0} \\ \mathbf{0} & \Lam_{B} \end{bmatrix} \\ | ||
|
||
so that | ||
|
||
.. math:: | ||
|
||
\Kuf \Lam^{-1} \Kfu &= \begin{bmatrix} \mathbf{K}_{u \cancel{B}} & \mathbf{K}_{u B} \end{bmatrix} | ||
\begin{bmatrix} \Lam_{\cancel{B}}^{-1} & \mathbf{0} \\ \mathbf{0} & \Lam_{B}^{-1} \end{bmatrix} | ||
\begin{bmatrix} \mathbf{K}_{\cancel{B} u} \\ \mathbf{K}_{B u} \end{bmatrix} \\ | ||
&= \mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} + | ||
\mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u} \\ | ||
\mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} &= \Kuf \Lam^{-1} \Kfu - \mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u} | ||
|
||
This last one is the inverse term in the training-data dependent part of the PITC posterior covariance using only the training points in :math:`\cancel{B}`, which I think is what we want to combine with the dense posterior from the training points in :math:`B`: | ||
|
||
.. math:: | ||
(\sigma_*^2)^{\cancel{B} \cancel{B}} = K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \mathbf{K}_{u \cancel{B}} \Lam_{\cancel{B}}^{-1} \mathbf{K}_{\cancel{B} u} \right)^{-1} \mathbf{K}_{u*} | ||
|
||
which we can write in terms of only the full PITC prior and the prior for group :math:`B`: | ||
|
||
.. math:: | ||
(\sigma_*^2)^{\cancel{B} \cancel{B}} = K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \Kuu + \Kuf \Lam^{-1} \Kfu - \mathbf{K}_{u B} \Lam_{B}^{-1} \mathbf{K}_{B u} \right)^{-1} \mathbf{K}_{u*} | ||
|
||
If you set :math:`B = \emptyset`, removing the correction for the training points in block :math:`B` entirely, you have the original PITC posterior covariance based on the full set of training points :math:`f`, as expected. Now we only have to compute :math:`\Kuf \Lam^{-1} \Kfu` (the term that scales with the full training set :math:`f`) once, and nothing that scales with :math:`|\cancel{B}|`. This still involves an :math:`O(M^3)` decomposition for each group :math:`B`. I have been assuming our group size :math:`|B|` is significantly smaller than the number of inducing points :math:`M`, so it's not great, but an interesting thing happens if we use the "downdate" form of Woodbury's lemma in the opposite direction to the usual way: | ||
|
||
.. math:: | ||
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
|
||
Letting :math:`\mathbf{A} = \Kuu + \Kuf \Lam^{-1} \Kfu` (playing the part of :math:`A` in Woodbury's lemma): | ||
|
||
.. math:: | ||
|
||
(\sigma_*^2)^{\cancel{B} \cancel{B}} &= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \left( \mathbf{A}^{-1} + \mathbf{A}^{-1} \mathbf{K}_{u B} \left(\Lam_{B} - \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u B}\right)^{-1} \mathbf{K}_{B u} \mathbf{A}^{-1} \right) \mathbf{K}_{u*} \\ | ||
&= K_* - \mathbf{K}_{*u} \Kuu^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \mathbf{A}^{-1} \mathbf{K}_{u*} + \mathbf{K}_{*u} \mathbf{A}^{-1} \mathbf{K}_{u B} \left(\Lam_{B} - \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u B}\right)^{-1} \mathbf{K}_{B u} \mathbf{A}^{-1} \mathbf{K}_{u*} \\ | ||
|
||
I find the use of Woodbury's lemma to downdate a bit uncomfortable -- does this work with real symmetric matrices? But we would only need to decompose :math:`\mathbf{A}` (and also :math:`\Kuu`) at a cost of :math:`O(M^3)` once; the inversion to be calculated per group costs :math:`O(|B|^3)`. I also don't immediately spot a slick QR decomposition way to do this per-group term; it's pretty symmetrical. The only thing that jumps out is precomputing :math:`\mathbf{K}_{B u} \mathbf{A}^{-\frac{1}{2}}`. | ||
|
||
---------------------------------------------- | ||
Derivation of Woodbury's Lemma for Differences | ||
---------------------------------------------- | ||
|
||
This is specifically for symmetric matrices, to convince myself the downdate formula is legit. We would like to compute a rank-:math:`k` downdate to the inverse of a matrix :math:`A`. | ||
|
||
.. math:: | ||
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
|
||
To be extra clear, let's say :math:`A \in \mathbb{R}^{n \times n}`, :math:`B \in \mathbb{R}^{n \times k}`, :math:`C \in \mathbb{R}^{k \times k}`. We put the relevant matrices into a block matrix :math:`M \in \mathbb{R}^{m \times m}` for :math:`m = n + k`, and label the conforming blocks of its inverse: | ||
|
||
.. math:: | ||
M &= \begin{pmatrix} A & B \\ B^T & C\end{pmatrix} \\ | ||
M^{-1} &= \begin{pmatrix} W & X \\ X^T & Y\end{pmatrix} \\ | ||
M M^{-1} &= \begin{pmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{pmatrix} \\ | ||
M^{-1} M &= \begin{pmatrix} \mathbf{I} & \mathbf{0} \\ \mathbf{0} & \mathbf{I} \end{pmatrix} | ||
|
||
The blocks of :math:`M M^{-1}` yield the following equations: | ||
|
||
.. math:: | ||
AW + BX^T &= \mathbf{I} \\ | ||
AX + BY &= \mathbf{0} \\ | ||
B^T W + CX^T &= \mathbf{0} \\ | ||
B^T X + CY &= \mathbf{I} \\ | ||
|
||
Rearrange the middle two: | ||
|
||
.. math:: | ||
X &= -A^{-1} B Y \\ | ||
X^T &= -C^{-1} B^T W | ||
|
||
If we do the same for :math:`M^{-1} M`: | ||
|
||
.. math:: | ||
X &= -W B C^{-1} \\ | ||
X^T &= -Y B^T A^{-1} | ||
|
||
These blocks are equal: | ||
|
||
.. math:: | ||
C^{-1} B^T W &= Y B^T A^{-1} \\ | ||
A^{-1} B Y &= W B C^{-1} \\ | ||
|
||
Now use the middle two equations from :math:`M M^{-1}` and plug them into the first and last equations irrespectively: | ||
|
||
.. math:: | ||
W - A^{-1} B C^{-1} B^T W &= A^{1} \\ | ||
\left(I - A^{-1} B C^{-1} B^T\right) W &= A^{-1} \\ | ||
-C^{-1} B^T A B Y + Y &= C^{-1} \\ | ||
\left(I - C^{-1} B^T A^{-1} B\right)Y &= C^{-1} \\ | ||
|
||
Assuming :math:`\left(I - A^{-1} B C^{-1} B^T\right)` and :math:`\left(I - C^{-1} B^T A^{-1} B\right)` are invertible, rearrange: | ||
|
||
.. math:: | ||
W &= \left(I - A^{-1} B C^{-1} B^T\right)^{-1} A^{-1} \\ | ||
&= \left(A - B C^{-1} B^T\right)^{-1} \\ | ||
Y &= \left(I - C^{-1} B^T A^{-1} B\right)^{-1} C^{-1} \\ | ||
&= \left(C - B^T A^{-1} B\right)^{-1} | ||
|
||
Now use equality of the off-diagonal blocks from the two ways to multiply :math:`M` and :math:`M^{-1}` above (don't you wish Sphinx would number equations?) and substitute: | ||
|
||
.. math:: | ||
C^{-1} B^T \left(A - B C^{-1} B^T\right)^{-1} &= \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
\left(A - B C^{-1} B^T\right)^{-1} B C^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} \\ | ||
|
||
Let's look just at the term :math:`\left(A - B C^{-1} B^T\right)` and right-multiply by :math:`-A^{-1}`: | ||
|
||
.. math:: | ||
\left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) &= -\mathbf{I} + B C^{-1} B^T A^{-1} \\ | ||
\mathbf{I} + \left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) &= B C^{-1} B^T A^{-1} | ||
|
||
Now let's return to the previous equation and right-multiply by :math:`B^T A^{-1}`: | ||
|
||
.. math:: | ||
\left(A - B C^{-1} B^T\right)^{-1} B C^{-1} B^T A^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
|
||
Substitute the previous result for :math:`B C^{-1} B^T A^{-1}`: | ||
|
||
.. math:: | ||
\left(A - B C^{-1} B^T\right)^{-1} \left( \mathbf{I} + \left(A - B C^{-1} B^T\right) \left(-A^{-1}\right) \right) &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
\left(A - B C^{-1} B^T\right)^{-1} - A^{-1} &= A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \\ | ||
\left(A - B C^{-1} B^T \right)^{-1} &= A^{-1} + A^{-1} B \left(C - B^T A^{-1} B\right)^{-1} B^T A^{-1} \blacksquare | ||
|
||
which is the difference form of Woodbury's lemma, or a rank-:math:`k` downdate of :math:`A^{-1}`. The main assumption seems to be that :math:`\left(I - A^{-1} B C^{-1} B^T\right)` and :math:`\left(I - C^{-1} B^T A^{-1} B\right)` are invertible. | ||
|
||
------------------------ | ||
Blockwise inversion of S | ||
------------------------ | ||
|
||
I tried to invert :math:`\mathbf{S}` blockwise; it didn't work out but I'm leaving it in here just to look back at: | ||
|
||
.. math:: | ||
\newcommand{VV}{\mathbf{V}} | ||
\mathbf{U} &= \mathbf{Q}_{* \cancel{B}} \mathbf{S}^{-1}_{\cancel{B} B} \VV_{B *} \\ | ||
&= \mathbf{Q}_{* \cancel{B}} \left( \mathbf{Q}_{\cancel{B} \cancel{B}}^{-1} \mathbf{Q}_{\cancel{B} B} \left(\mathbf{Q}_{B B} - \mathbf{Q}_{B \cancel{B}} \mathbf{Q}_{\cancel{B} \cancel{B}}^{-1} \mathbf{Q}_{\cancel{B} B}\right)^{-1} \right) \VV_{B *} | ||
|
||
|
||
How are we going to avoid the :math:`O(|\cancel{B}|^3)` inversion for :math:`\mathbf{Q}_{\cancel{B} \cancel{B}}`? Also we are going to get cross-terms with both :math:`\mathbf{Q}_{* \cancel{B}}` and :math:`\mathbf{Q}_{B B}` involved, but maybe they are OK because they are only :math:`O(|\cancel{B}| |B|)`? |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing the equations of the first section.
The Woodbury lemma variant proof is pretty brutal !
I wonder if the condition you highlighted in the conclusion is necessary in absolute, or if it was just for the derivation. Maybe one could just try multiplying the inverse by (A - BC-1B.transpose) on the right or left to get identity and maybe this does not assume these conditions ? (just an idea)