Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surprising OOM error #19

Open
kawshik8 opened this issue May 5, 2023 · 1 comment
Open

Surprising OOM error #19

kawshik8 opened this issue May 5, 2023 · 1 comment

Comments

@kawshik8
Copy link

kawshik8 commented May 5, 2023

There are two approaches based on CLIP that i'm trying to compare here

  1. A resnet 18 with a Bert base model - everything is updated during training
  2. A resnet 50 with a Bert base model - Bert is frozen

I get an OOM error in the second case on the cached model_forward step even though the second case uses lesser number of parameters during training (50 M vs 110 M).

To give some context, I'm using pytorch lightning with the functional decorator and it works well for the first case - providing a lot of benefits with bigger batch sizes during training

Any reason why this would happen ?

@aaprasad
Copy link

aaprasad commented Jun 9, 2023

hey @kawshik8 would u be able to provide an example of how to use GradientCache with lightning?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants