Why doesn't qUpperConfidenceBound use ReLU of the deviation instead of the absolute deviation from the posterior mean? #1672
Replies: 3 comments 4 replies
-
Hmm this is a good point. @j-wilson is the right person for this question :) |
Beta Was this translation helpful? Give feedback.
-
@AlexStreicher thank you for the suggestion. Have you done any benchmarks with this alternative formulation? |
Beta Was this translation helpful? Give feedback.
-
Hi @AlexStreicher, This is an interesting question. The main reason for introducing q-UCB in the paper you referenced was to argue that reinterpreting acquisition functions as expectations can facilitate the design of batch variants. We introduced a UCB-like batch acquisition function, but this is not necessarily the best one. If you think that it can be improved, then I'd love to hear more! Also, it sounds like you're less than thrilled with our loose usage of the term UCB, and I agree that we could have done a better job here. As I recall it, we primarily considered two candidates for q-UCB:
Both of these were intended to retain the spirit of optimism that UCB is known for. In (1), this is accomplished by explicitly assuming that each outcome will be favorable. Unfortunately, this formulation strongly discourages batches of anti-correlated queries (since I'm curious to learn more about your suggestion of using |
Beta Was this translation helpful? Give feedback.
-
I was looking at the definition of qUpperConfidenceBound and I was getting a little nervous about the fact that it uses absolute deviation even while sampling q correlated points simultaneously.
To review, in the referred paper (Wilson et al. Appendix A) they start with the q=1 UpperConfidenceBound
$$UCB(x; \beta) \equiv \mu(x)+\beta \sigma(x),\quad y|x\sim\mathcal{N}(\mu(x),\sigma(x)).$$ For reference, when q=1, $y, \mu, \sigma$ are all scalars, and we will be dropping explicit arguments of x moving forward.$y|x$ : $$UCB(x; \beta) \equiv \mu+\beta \sigma=\intop dy \left(\mu+\beta\sqrt{\frac{\pi}{2}}|y-\mu|\right) \rho(y),\quad y|x\sim\mathcal{N}(\mu,\sigma),$$ using the fact that for a Gaussian variable, $\mathbb{E}[|y-\mu|]=\sigma\sqrt{2/{\pi}}$ .
To prepare for Monte Carlo sampling and reparametrization, they reformulate it by rewriting the standard deviation an integral over the distribution of
They then propose a parallel version of UpperConfidenceBound for q>1 simultaneous points:
$$qUCB(\mathbf{x}; \beta) \coloneqq \intop d\mathbf{y} \; max_{q} \left(\boldsymbol{\mu}+\beta\sqrt{\frac{\pi}{2}}|\mathbf{y}-\boldsymbol{\mu}|\right) \rho(\boldsymbol{y}),\quad \mathbf{y}|\mathbf{x}\sim\mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma})$$ $$\mathbf{y}=(y^{(1)},...,y^{(q)}),\quad \boldsymbol{\mu}=(\mu^{(1)},...,\mu^{(q)}), \quad \boldsymbol{\Sigma}=((Cov(y^{(1)},y^{(1)}),Cov(y^{(1)},y^{(2)}),...),...,(Cov(y^{(q)},y^{(1)}),...,Cov(y^{(q)},y^{(q)}))) $$ The idea here is that we're integrating this max-quantity over our multivariate posterior distribution of q simultaneous points.
However, note what we're integrating. During integration, at any point the$\mathbf{y}$ -domain we're not taking the value of the 1-UCB for whichever of the q-points has the max UCB. Instead, what we're doing is taking the max value of a quantity that when marginalized, and integrated over, becomes the 1-UCB. For the toy example where the q points are completely uncorrelated, there's no difference between those two statements. However, when the multivariate posterior being considered has strong (anti)correlation between the outcomes (e.g. q=2 and the $y$ 's are strongly (anti)correlated), it is not clear to me that the above quantity should be referred to as "UpperConfidenceBound"
Why not replace$\beta\sqrt{{\pi}/{2}}|y-\mu|$ with $\beta\sqrt{2\pi}*\mathrm{ReLU}(y-\mu)$ ? In the code, this would amount to taking the line
ucb_samples = mean + self.beta_prime * (obj - mean).abs()
withucb_samples = mean + 2 * self.beta_prime * (obj - mean).clamp_min(0)
.Beta Was this translation helpful? Give feedback.
All reactions