Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About Parallel encoder #4

Open
sonwe1e opened this issue Dec 19, 2023 · 4 comments
Open

About Parallel encoder #4

sonwe1e opened this issue Dec 19, 2023 · 4 comments

Comments

@sonwe1e
Copy link

sonwe1e commented Dec 19, 2023

Great work on the study, but I have some queries I'd like to ask.

If the time-steps considered as non-key directly skip the encoding step of the encoder, how are the images decoded from the features encoded by the key time-step encoder used in these non-key time-steps? Since the encoders at non-key time-steps are skipped, there wouldn't be any encoding at time t+1 either. Why not skip the non-key phases altogether?

@sonwe1e
Copy link
Author

sonwe1e commented Dec 19, 2023

My point is that if time t is a key moment, and t+1, t+2, t+3 are non-key, this means that the decoders for t+1, t+2, t+3 all use the features f_t from time t. According to the parallel steps in the paper, t+1, t+2, t+3 all need to decode f_t, but these time steps do not utilize the encoder. So, what is the purpose of the results obtained from this decoding?

I hope I have made my question clear, Thanks

@hutaiHang
Copy link
Owner

My point is that if time t is a key moment, and t+1, t+2, t+3 are non-key, this means that the decoders for t+1, t+2, t+3 all use the features f_t from time t. According to the parallel steps in the paper, t+1, t+2, t+3 all need to decode f_t, but these time steps do not utilize the encoder. So, what is the purpose of the results obtained from this decoding?

I hope I have made my question clear, Thanks

Even though the encoder of UNet is not used during non-key timesteps, its decoder receives shared encoder features from key timesteps, then outputs the predicted noise $\epsilon$, to updates $z_t$. I hope I understand your question correctly.

@sonwe1e
Copy link
Author

sonwe1e commented Dec 19, 2023

Thank you for your answer, it has nicely resolved my doubts. I made a silly mistake.

@sonwe1e sonwe1e closed this as completed Dec 19, 2023
@sonwe1e
Copy link
Author

sonwe1e commented Dec 19, 2023

Thank you again for your response. I have another question. From the graph, it seems that a smaller interval in the Uniform method means fewer skipped encoders, which should mean it's closer to the original diffusion process. But why then is the performance of I worse than that of II
image

@sonwe1e sonwe1e reopened this Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants