-
Notifications
You must be signed in to change notification settings - Fork 414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
linear interpolation? #14
Comments
I think it depends on the shape of your latent space. If your latent space is spherical already (i.e. you learnt to sample from a gaussian while training, rather than from uniform), linear interpolation seems okay. If you sample Z from a cube (uniform), a spherical interpolation seems like a much better idea. https://github.com/soumith/dcgan.torch/blob/master/main.lua#L23 |
Agreed it depends on the shape of the latent space. But at z=100, switching from a prior of noise:uniform(-1, 1) to a prior of noise:normal(0,1) yields the same result: points along the linear interpolation between two randomly selected points will go way outside the expected distribution. To clarify my main point, calling the following code: local noise1 = torch.Tensor(opt.batchSize, 100, 1, 1)
noise1:normal(0, 1)
local noise2 = torch.Tensor(opt.batchSize, 100, 1, 1)
noise2:normal(0, 1) Will always result in two 100 dim vectors with length about 10. If you choose to linearly interpolate between them, you will invariably get a "tentpole" effect in which the length decreases from 10 to 7 at the midpoint, which is over 4 standard deviations away from the expected length. Shouldn't the interpolated points instead be from the same distribution as the original samples? Happy to uncover my own conceptual flaw in how latent spaces are sampled, which is certainly possible. My ipython notebook code to replicate this is below. from matplotlib import pylab as plt
%matplotlib inline
import numpy as np # random_points = np.random.uniform(low=-1, high=1, size=(1000,100))
random_points = np.random.normal(loc=0, scale=1, size=(1000,100))
lengths = map(np.linalg.norm, random_points)
print("Mean length is {:3.2f} and std is {:3.2f}".format(np.mean(lengths), np.std(lengths)))
n, bins, patches = plt.hist(lengths, 50, normed=1, facecolor='green', alpha=0.75)
plt.show() # take midpoint of two adjacent points in vector
def midpoint_length(points, ix):
num_points = len(points)
next_ix = (ix + 1) % num_points
avg = (points[ix] + points[next_ix]) / 2.0
return np.linalg.norm(avg)
mid_lengths = []
for i in range(len(random_points)):
mid_lengths.append(midpoint_length(random_points, i))
print("Mean length is {:3.2f} and std is {:3.2f}".format(np.mean(mid_lengths), np.std(mid_lengths)))
n, bins, patches = plt.hist(mid_lengths, 50, normed=1, facecolor='green', alpha=0.75)
plt.show() |
To visually demonstrate the relevance to this codebase, I constructed 5 (uniform) random interpolations from the pre-trained Each interpolation is presented in pairs: the first line is linear interpolation and the second is spherical interpolation. To my eye, the first line often suffers from blurring in the center or other visual washout while the second line stays crisper and more visually consistent with the style of the endpoints. This is a pattern I've seen in other latent spaces as well. We can also visualize the tentpole effect by graphing the lengths of all of the vectors across the interpolation. Here are the five linear interpolations: The lengths at each end are about 5.75, but in the center they sag down to just above 4. This is exactly what is predicted by the distributions in the original comment. Arguably, this sag correlates exactly with the visual artifacts in the rendered images above. We can compare that to the lengths when using spherical interpolation, which of course won't sag: Getting the shape of the latent space right is important, which is why I've spent time making this case here. If my argument is right, then this has implications for how to most accurately compute interpolations, extrapolations, flythroughs, averages, etc. in latent space. Alternately, a different prior could perhaps be used so that these operations could remain linear. |
After reading this through, I am convinced that spherical interpolation is essential as well. The weak generations in the center are something I've noticed as well. Thanks for pointing this out. I think the overall fix is to always keep the intermediate vectors constant. |
I just spoke to Arthur Szlam who smacked me on the head, because he's been telling this to me for months, and i conveniently ignored it. He says the latent space should also be sampled from points on an n-dimensional hypersphere, and when doing interpolations, you just take the path on the great circle. |
Thanks for thinking this through with me. Having just revisited Domingos' classic A Few Useful Things to Know about Machine Learning, this quote now meant more to me:
So are you suggesting constraining all latent vectors to lie exactly on the unit n-sphere? That would be an interesting simplification if it worked. I instead wrote my own interpolator which does a great circle path with elevation changes. Feel free to adapt it to this codebase if you'd like. def slerp(val, low, high):
omega = np.arccos(np.dot(low/np.linalg.norm(low), high/np.linalg.norm(high)))
so = np.sin(omega)
return np.sin((1.0-val)*omega) / so * low + np.sin(val*omega)/so * high |
Thanks for the slerp implementation, I started using it and thought I'd share some fixed edge cases. Not sure what the convention should be for the degenerate opposite vectors case, currently it just lerps them. def slerp(val, low, high):
omega = np.arccos(np.clip(np.dot(low/np.linalg.norm(low), high/np.linalg.norm(high)), -1, 1))
so = np.sin(omega)
if so == 0:
return (1.0-val) * low + val * high # L'Hopital's rule/LERP
return np.sin((1.0-val)*omega) / so * low + np.sin(val*omega) / so * high
print(slerp(0, np.array([1,0,0]), np.array([1,0,0])))
print(slerp(0.5, np.array([1,0,0]), np.array([1,0,0])))
print(slerp(0, np.array([1,0,0]), np.array([0.5,0,0])))
print(slerp(0.5, np.array([1,0,0]), np.array([0.5,0,0])))
print(slerp(0, np.array([1,0,0]), np.array([-1,0,0])))
print(slerp(0.5, np.array([1,0,0]), np.array([-1,0,0])))
# [ 1. 0. 0.]
# [ 1. 0. 0.]
# [ 1. 0. 0.]
# [ 0.75 0. 0. ]
# [ 1. 0. 0.]
# [ 0. 0. 0.] |
@mgarbade I actually just stole the slerp for feature vector interpolation when playing with this paper (which is not a GAN) in MXNet. The equivalent input |
Hey, this is a super useful thread, thanks to all! In case it's useful for anyone (this thread ranks high on SEO :), multivariate gaussian converges to hypersphere with radius of sqrt(dim-1) (can approximate as sqrt(dim) for large dim) and variance of 0.5. (Technically chi distribution with variance 1, but for high dim corresponds to gaussian with variance 0.5). See below for empirical code example, and nice brief explanation at https://www.johndcook.com/blog/2011/09/01/multivariate-normal-shell/)
|
Hey, super useful thank you! I needed a batchwise, n dimensional slerp that does a different range of interpolation steps for each sample for my purposes, so I adapted your code to tensorflow. Here it is in case anyone needs it:
And usage to visualise it in the 2D case
|
@pqn, thank you for this snippet. Could you explain how is the case ( |
@cocoaaa It's been a while since I thought about this but IIRC, if |
@dribnet, @soumith, I've stumbled across this threads a few times, and I'm puzzled by some of the things I read about what is intuitive and counterintuitive in high dimensions. After some thinking, it seems to me somewhat possible to build an intuition of the 'soap bubble' metaphor for Gaussians in high dimensions: the key point is to realise how fast volume grows, so even if the 'shell' is relatively thin, the volume it contains is really large, hence the conclusion that most of the mass of the Gaussian is located there. Still, however, I can't help but feel there's more to this story, and a metaphor might help here. Imagine you're the captain of a starship mining asteroids in a high-dimensional universe (perhaps from Liu Cixin's books), and for the sake of this thought experiment the starship and the asteroids are roughly the same size/volume. You navigate close to an asteroid field that happens to be Gaussian-distributed, then two things can be noted:
Does any of this make sense? I'd be keen to hear if I get this intuition wrong. I feel like this 'counter-intuition' to the standard counter-intuition might have repercussions for how we understand sampling in latent spaces? I suppose that might explain this? |
Hi - I've been doing a lot of work lately with interpolation in latent space, and I think linear interpolation might not be the best interpolation operator for high dimensional spaces. Though admittedly this is common practice, this seemed as good a place as any to discuss this, since the dcgan code seems to do exactly that here:
I'm starting with the assumption that
torch.FloatTensor(opt.nz):uniform(-1, 1)
is a valid way to uniformly sample from the prior in the latent space. In the examples below, I'll leave thenz
dimension at the default of100
. Let's do an experiment and see what the expected lengths of these vectors are.I see a gaussian with mean about 5.76 and with 0.25 standard deviation. I believe this means that >99% of vectors would be expected to have a length between 4.8 and 6.8 (4 standard deviations out). This result should not be a big surprise if we think about taking 100 independent random numbers and then running them through the distance formula.
But now let's think about the effects of linear interpolation between these random vectors. At an extreme, we have the linearly interpolated midpoints halfway between any two of these vectors - let's see what the expected lengths of these are.
So now we have a gaussian with a mean vector of 4.06 and 0.24 standard deviation. Needless to say, these are not the same distribution, and in fact they are effectively disjoint - the probability of an item from the second appearing in the first is vanishingly small. In other words, the points on the linearly interpolated path are many standard deviations away from points expected in the prior distribution.
If my premise is correct that
torch.FloatTensor(opt.nz):uniform(-1, 1)
performs a uniform sampling across the latent space (a big if, and I'd like to verify this!), then the prior is more shaped like a hypersphere. In that case, spherical interpolation makes a lot more sense, and in my own experiments I've had good qualitative results with this approach. Curious what others think. Also note that this reasoning could be extended beyond just interpolation since this would also affect other interpretable operations - such as finding the average in a subset of labeled data (eg: average man or woman in faces).The text was updated successfully, but these errors were encountered: