-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce PIC approximation #492
base: master
Are you sure you want to change the base?
Conversation
doc/src/crappy-pic.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still reviewing the equations of the first section.
The Woodbury lemma variant proof is pretty brutal !
I wonder if the condition you highlighted in the conclusion is necessary in absolute, or if it was just for the derivation. Maybe one could just try multiplying the inverse by (A - BC-1B.transpose) on the right or left to get identity and maybe this does not assume these conditions ? (just an idea)
doc/src/sparse-gp-details.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still reviewing/trying to understand, I will send you offline if I found something in the equations
K_xu[i] = cov_(x, inducing_points_[i]); | ||
K_uy[i] = cov_(inducing_points_[i], y); | ||
} | ||
// const Eigen::VectorXd K_uy = cov_(inducing_points_, y); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove comment
|
||
template <typename CovarianceType, typename InducingFeatureType, | ||
typename GrouperFunction> | ||
class BruteForcePIC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a comment describing what brute force PIC does.
|
||
void shift_mean(const Eigen::VectorXd &mean_shift) { | ||
ALBATROSS_ASSERT(mean_shift.size() == information.size()); | ||
information += train_covariance.solve(mean_shift); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also have to shift the local group values (mean_w
?). Maybe worth just hard failing here (or removing this?) if it hasn't been tested.
}; | ||
|
||
/* | ||
* This class implements an approximation technique for Gaussian processes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this for PIC or PITC?
MarginalDistribution _predict_impl( | ||
const std::vector<FeatureType> &features, | ||
const Fit<PICGPFit<GrouperFunction, InducingFeatureType, FitFeatureType>> | ||
&sparse_gp_fit, | ||
PredictTypeIdentity<MarginalDistribution> &&) const { | ||
const auto K_up = | ||
this->covariance_function_(sparse_gp_fit.inducing_features, features); | ||
Eigen::VectorXd mean = gp_mean_prediction(K_up, sparse_gp_fit.information); | ||
this->mean_function_.add_to(features, &mean); | ||
|
||
// First pass: compute an O(1) mapping from features to groups | ||
std::vector<std::size_t> feature_to_block(features.size()); | ||
std::vector<decltype(sparse_gp_fit.measurement_groups.begin())> groups( | ||
features.size()); | ||
Eigen::VectorXi col_alloc{Eigen::VectorXi::Zero(features.size())}; | ||
bool all_same_group = true; | ||
for (Eigen::Index j = 0; j < features.size(); ++j) { | ||
groups[j] = sparse_gp_fit.measurement_groups.find( | ||
independent_group_function_(without_measurement(features[j]))); | ||
if (groups[j] != sparse_gp_fit.measurement_groups.end()) { | ||
col_alloc(j) = groups[j]->second.block_size; | ||
all_same_group = all_same_group && groups[j] == groups[0]; | ||
} | ||
feature_to_block[j] = | ||
std::distance(sparse_gp_fit.measurement_groups.begin(), groups[j]); | ||
} | ||
|
||
// Second pass: compute mean vector and fill sparse Vp matrix | ||
Eigen::VectorXd mean_correction = Eigen::VectorXd::Zero(features.size()); | ||
Eigen::SparseMatrix<double> Vp(sparse_gp_fit.train_features.size(), | ||
features.size()); | ||
Vp.reserve(col_alloc); | ||
for (Eigen::Index j = 0; j < features.size(); ++j) { | ||
if (groups[j] == sparse_gp_fit.measurement_groups.end()) { | ||
continue; | ||
} | ||
|
||
const auto &B = groups[j]->second; | ||
Eigen::VectorXd Vbp = | ||
this->covariance_function_(B.dataset.features, | ||
std::vector<FeatureType>{features[j]}) - | ||
K_up.col(j).transpose() * sparse_gp_fit.covariance_Y[B.block_index]; | ||
for (Eigen::Index i = 0; i < Vbp.size(); ++i) { | ||
Vp.insert(B.initial_row + i, j) = Vbp(i); | ||
} | ||
mean_correction[j] = Vbp.dot(sparse_gp_fit.mean_w[B.block_index]); | ||
} | ||
Vp.makeCompressed(); | ||
|
||
Eigen::MatrixXd xi_lambda = sparse_gp_fit.A_ldlt.sqrt_solve(Vp); | ||
Eigen::MatrixXd xi_u = sparse_gp_fit.Z * Vp; | ||
Eigen::VectorXd VSV_diag{ | ||
(xi_lambda.transpose() * xi_lambda - xi_u.transpose() * xi_u) | ||
.diagonal()}; | ||
|
||
const Eigen::VectorXd U_diag = | ||
(K_up.transpose() * sparse_gp_fit.W * Vp).diagonal(); | ||
|
||
Eigen::VectorXd marginal_variance(cast::to_index(features.size())); | ||
for (Eigen::Index i = 0; i < marginal_variance.size(); ++i) { | ||
marginal_variance[i] = this->covariance_function_( | ||
features[cast::to_size(i)], features[cast::to_size(i)]); | ||
} | ||
|
||
const Eigen::MatrixXd Q_sqrt = | ||
sparse_gp_fit.train_covariance.sqrt_solve(K_up); | ||
const Eigen::VectorXd Q_diag = | ||
Q_sqrt.cwiseProduct(Q_sqrt).array().colwise().sum(); | ||
marginal_variance -= Q_diag; | ||
|
||
const Eigen::MatrixXd S_sqrt = | ||
sqrt_solve(sparse_gp_fit.sigma_R, sparse_gp_fit.P, K_up); | ||
const Eigen::VectorXd S_diag = | ||
S_sqrt.cwiseProduct(S_sqrt).array().colwise().sum(); | ||
marginal_variance += S_diag; | ||
|
||
mean += mean_correction; | ||
|
||
return MarginalDistribution(mean, | ||
marginal_variance - (2 * U_diag + VSV_diag)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👏 can't say I follow all of it, but I know this was a tough one. Well done.
tests/test_pic_gp.cc
Outdated
#include <fstream> | ||
#include <gtest/gtest.h> | ||
|
||
// #include "albatross/src/eigen/serializable_ldlt.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits, and (as you're aware) I'm sure there are optimizations/improvements BUT this is F*&#ing awesome, and I don't see any changes here which would cause any backward incompatibility for use cases using PITC at the moment. good job!
This commit implements the Partially Independent Conditional sparse GP approximation introduced in
https://proceedings.mlr.press/v2/snelson07a/snelson07a.pdf
Although I'm not aware of any outstanding bugs, this approximation remains experimental at the time of writing.