You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When using UperNet decoder with plain ViT decoder_scale_modules should be set to True to upscale the layers to simulate a "hierarchical output". This part can also be done in the LearnedInterpolateToPyramidal neck, in the tests for the Unet, which also needs this hierarchical output, this neck is used.
For this, I suggest deprecating scale_modules for the UperNetDecoder and recommend using the neck instead.
Also, we should consider the architecture for this neck as currently it only supports 4 layers and the last one is downscaled using maxpool, not sure if this is the best option and it might be interesting to look into it.
I agree, It is more inline with our architecture design to use necks instead of decoder specific settings. As you mentioned, the necks should also generalize better, by making it possible to configure the scaling (e.g. also scaling only 3 or 5 layers).
Also, the last layers should be scaled while the first layers are ignored. E.g. with 5 input laters and 4 scaling settings, the 1 laters gets passed. As Francesc explained it to me, smp_Unet required a additional 1. input layers that gets ignored.
When using
UperNet
decoder with plain ViTdecoder_scale_modules
should be set toTrue
to upscale the layers to simulate a "hierarchical output". This part can also be done in theLearnedInterpolateToPyramidal
neck, in the tests for the Unet, which also needs this hierarchical output, this neck is used.For this, I suggest deprecating
scale_modules
for theUperNetDecoder
and recommend using the neck instead.Also, we should consider the architecture for this neck as currently it only supports 4 layers and the last one is downscaled using maxpool, not sure if this is the best option and it might be interesting to look into it.
CC: @blumenstiel
The text was updated successfully, but these errors were encountered: