-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding some design choices #7
Comments
Thanks for the question -- I'll speak to (1) a bit. The act of subsampling the feature map (not blurring!) already commits you to either remove or misrepresent high-frequency information. Without blurring, you are misrepresenting it. That's what aliasing is; high-freq information gets entangled into the low-freq. The act of blurring is saying you would rather not represent than actively misrepresent the information. |
Thanks for the comments! Now I understand the intuitions! [1] Consider the input/output images are discrete and finite (i.e, 0-255 in uint8 than normalize to [-1, 1] in float32), and the intermediate features are discrete (but much finer-grained than image colors) and infinite with float32, the cardinality of all intermediate features are much larger than the input/output images. It is a bit hard to reject the possibility that the encoder can still preserve the high-frequency information in certain ways, despite it is also empirically known that the existing autoencoders are still far from perfect reconstructions. |
Note the blur happens after the conv-relu feature extractor (which is free to learn hf/lf filters), immediately before subsampling (which will cause aliasing) |
Really impressive work and high-quality code release!
I found several intriguing design choices while digging into the codebase, and looking for some clarifications or explanations of them:
The encoder architecture seems partially borrow the StyleGAN2 designs with blur operations in the conv layers (I suppose is for anti-aliasing). However, the blur operations also wipe out some of the high-frequency information, which should be crucial for detail reconstruction. Despite the high-frequency information is later infused with randomized noise injection in the decoder, it can never be a faithful reconstruction of the input. It seems to me that the reconstruction should be more important than anti-aliasing. Could you clarify a bit on this design choice?
Similar to 1., the randomized noises injection in the decoder has no information from the input image, thus it should negatively affect the reconstruction quality. It seems a bit counter-intuitive to me in terms of image reconstruction.
Sincerely sorry for the excessively long questions and looking forward to your answers!
The text was updated successfully, but these errors were encountered: