Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question Regarding Training Hyperparameters for DINOv2 + BoQ Model #21

Open
SeunghanYu opened this issue Dec 26, 2024 · 0 comments
Open

Comments

@SeunghanYu
Copy link

SeunghanYu commented Dec 26, 2024

Hello, @amaralibey

I have a question about the training hyperparameters for the DINOv2 + BoQ model.
In the OpenVPRLab repository, the boq_dinov2.yaml configuration file specifies the following hyperparameters for training:

#---------------------------------------------------
# Datamodule Configuration
#---------------------------------------------------
datamodule:
  train_set_name: "gsv-cities" # use "gsv-cities" if you have downloaded the full dataset
  train_image_size: 
    - 280
    - 280
  img_per_place: 4
  batch_size: 160
  num_workers: 8
  val_image_size: 
    - 322
    - 322
  val_set_names:
    - "msls-val"
    - "pitts30k-val"

#---------------------------------------------------
# VPR Model Configuration
#---------------------------------------------------
backbone:
  module: src.models.backbones
  class: DinoV2
  params:
    backbone_name: "dinov2_vitb14" # name of the vit backbone (see DinoV2.AVAILABLE_MODELS)
    num_unfrozen_blocks: 2

aggregator:
  module: src.models.aggregators # module path
  class: BoQ    # class name in the __init__.py file in the aggregators directory
  params:
    in_channels:  # if left blank we will use backbone.out_channels.
    proj_channels: 384
    num_queries: 64
    num_layers: 2
    row_dim: 32

#---------------------------------------------------
# Loss Function Configuration
#---------------------------------------------------
loss_function: 
  # check src/losses/vpr_losses.py for available loss functions, we are using pytorch_metric_learning library
  # if you want to develop your own loss function, you can add it to the vpr_losses.py file
  # or create a new file in the losses directory and import it into the __inin__.py file
  module: src.losses
  name: vpr_losses
  class: VPRLossFunction
  params:
    loss_fn_name: "MultiSimilarityLoss"   # other possible values: "SupConLoss", "ContrastiveLoss", "TripletMarginLoss"
    miner_name: "MultiSimilarityMiner"    # other possible values: "TripletMarginMiner", "PairMarginMiner"


#---------------------------------------------------
# Trainer Configuration
#---------------------------------------------------
trainer:
  optimizer: adamw
  lr: 0.0002      # learning rate
  wd: 0.001       # weight decay
  warmup: 3900    # linear warmup steps
  max_epochs: 40
  milestones:
    - 10
    - 20
    - 30
  lr_mult: 0.1 # learning rate multiplier at each milestone

However, in the train.py file of the Bag-of-Queries repository, I noticed that some hyperparameters are different.

For example:

## Training config:
self.batch_size: int = 128           # batch size is the number of places per batch
self.img_per_place: int = 4          # number of images per place
self.max_epochs: int = 40
self.warmup_epochs: int = 10         # number of linear warmup epochs (not iterations)
self.lr: float = 1e-4                # learning rate
self.weight_decay: float = 1e-4
self.lr_mul: float = 0.1
self.milestones: list = [10, 20]
self.num_workers: int = 8

def train(hparams, dev_mode=False):
    seed_everything(hparams.seed, workers=True)
    
    # Instantiate the backbone and define the image size for training and validation
    if "dinov2" in hparams.backbone_name:
        backbone = DinoV2(backbone_name=hparams.backbone_name, unfreeze_n_blocks=hparams.unfreeze_n_blocks)
        train_img_size = (224, 224)
        val_img_size = (322, 322)
        hparams.backbone_name = backbone.backbone_name # in case the user passed dinov2 without the version
        hparams.train_img_size = train_img_size
        hparams.val_img_size = val_img_size

Key differences include:

  • Training image size: (280, 280) -> (224, 224)
  • Batch size: 160 -> 128
  • learning rate: 0.0002 -> 0.0001
  • weight decay: 0.001 -> 0.0001
  • 3900 warmup iterations -> 10 warmup epochs
  • milestone values: [10, 20, 30] -> [10, 20]

These discrepancies leave me wondering which hyperparameter configuration to follow.

Could you kindly clarify the exact set of hyperparameters recommended for training the DINOv2 + BoQ model? Specifically:

1. What are the correct values for training image size, batch size, learning rate, weight decay, warmup strategy, and milestones?
2. Are there additional settings or considerations that I should be aware of when training this model?

Thank you for your assistance! I greatly appreciate your work on these repositories and look forward to your guidance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant