Change align_corners to True in position encoding interpolation #468

infinity1096 · 2024-09-24T21:55:01Z

In current implementation, we have align_corners=False, which according to this post this will led position encoding at the edge be value padded(have the same value as the endpoint of source values, the original PE) and behave differently compared to patches at the center.

We therefore suggest changing to align_corners=True to avoid the difference between PE at the edge v.s. others.

image credit: Lucas Beyer

maxrohleder · 2024-11-20T15:03:07Z

[just a curious note, not a request for changes]

I agree, align corners should be done when interpolating the position embedding to larger image sizes. (I believe this is your use case). However, what worries me more is that during training this function is used to downsize pos_enc tensors to the resolution of the local crops. This function is used with gradient descent!

Effect on gradient wrt position embeddings from local crops

In our experiments, a strange position embedding tensor emerged which we believe originates from the interpolation (resizing) of position encodings. Has anyone observed similar behavior?

Fig.1: ViT-T-8 architecture trained with DINOv1 from scratch on custom data.

During training, gradients from both local crops (at shape 96x96) and global crops (224x224) contribute to the formation of the learned position encodings. Downsizing the position encoding tensor from 224x244 to 96x96 (global to local crop sizes) results in gradient distributions like this during backprop (assuming constant gradient image):

Fig. 2: Constant gradient tensor (torch.ones) backpropagated through interpolation downsizing (F.interpolate()) with and without align_corners

Aligning corners appears to only make a small difference. What would really result in homogenously distributed gradients from local and global crops would be anti-aliasing according to this image:

Fig. 3: Gradient distribution with F.interpolate(x, anti_aliasing=True, align_corners=False)

Note: I previously used DINOv1. I just saw that in DINOv2, specifically this commit 9c7e324#diff-c711d58dde9a8d285c684d67f6e1872bba4220631ca0bc88f023ca3684dfb890, a parameter was introduced to enable anti-aliasing. I wonder what the reason for this was?

dinov2/dinov2/models/vision_transformer.py

Lines 203 to 208 in e1277af

    
           patch_pos_embed = nn.functional.interpolate( 
        
               patch_pos_embed.reshape(1, M, M, dim).permute(0, 3, 1, 2), 
        
               mode="bicubic", 
        
               antialias=self.interpolate_antialias, 
        
               **kwargs, 
        
           )

)

JoshuaScheuplein · 2024-11-25T09:40:51Z

In addition to the comment of @maxrohleder, I would like to share the result of an additional experiment, which we have conducted to investigate the influence of different parameter settings in torch.nn.functional.interpolate().

mode=bicubic, antialias=False, align_corners=False	mode=bicubic, antialias=True, align_corners=False

The figure on the right shows that using antialias=True produces a learned position encoding without a checkerboard pattern, unlike the initial case with antialias set to False. Additionally, the PCA reduced features show how the model nicely captures different spatial directions in the encoding (e.g., from left to right or from top to bottom).

We believe this behavior is caused by the smoother gradient backpropagating of nn.functional.interpolate(antialias=True) compared to nn.functional.interpolate(antialias=False) (see comment #468 (comment)). We would really appreciate it if you could briefly verify our solution and explain if self.interpolate_antialias in DINOv2 (commit 9c7e324) was introduced for the same reason.

maxrohleder mentioned this issue Nov 20, 2024

Add new backbones trained with registers #282

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change align_corners to True in position encoding interpolation #468

Change align_corners to True in position encoding interpolation #468

infinity1096 commented Sep 24, 2024

maxrohleder commented Nov 20, 2024 •

edited

Loading

JoshuaScheuplein commented Nov 25, 2024

Change align_corners to True in position encoding interpolation #468

Change align_corners to True in position encoding interpolation #468

Comments

infinity1096 commented Sep 24, 2024

maxrohleder commented Nov 20, 2024 • edited Loading

Effect on gradient wrt position embeddings from local crops

JoshuaScheuplein commented Nov 25, 2024

maxrohleder commented Nov 20, 2024 •

edited

Loading