T5FineTuner issue "in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() " #8

GeYue · 2020-12-28T15:39:52Z

Hi, Suraj,
I am trying to use your T5FineTune class to study the fine tune skill.
But, unfortunately, when I tried to run the program on my env, I got this error:

in training_epoch_end
avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean()
RuntimeError: stack expects a non-empty TensorList

I tried to track the cause and found that the "training_step" never be called.
I think it may relate with the "ImdbDataSet" for the train_dataloadder, but I debuged it and it seems all right.
I just begin to contact the DeepLearning, so maybe there is something is obvious but I really don't know.

Do you have any idea about what may cause it?
Thank you and looking forward your any feedback.

Best Regards

MarcosFP97 · 2021-01-27T10:26:10Z

Hi! I had the same problem and I figured out that it was a package version problem. In order to make this notebook work properly, you need to use this versions:

!pip install transformers==2.9.0 
!pip install pytorch_lightning==0.7.5

MarcosFP97 · 2021-01-27T10:46:02Z

I have created a PR, but meanwhile you can download the fixed notebook from my fork: here

Best,
Marcos

Jackthebighead · 2021-11-09T07:17:24Z

Thank @MarcosFP97 for the answer, I got the same issue and the loss is 'nan' during training. It can be solved by changing the package into the right version.

And also, perhaps the problem may be caused by the self-defined optimizer_step function. Another solution can be adding closure=optimizer_closure in optimizer.step() in the function optimizer_step(). This may work because in the self-defined optimizer_step() function, we need a closure function to return the last-training-backward-result to the ProgressBar in tqdm_dict.

In this way, my problem got solved without changing the package version. For example, add closure=optimizer_closure in the function:

def optimizer_step(self, epoch, batch_idx, optimizer, optimizer_idx, optimizer_closure, on_tpu=False, using_native_amp=False, using_lbfgs=False):
    if self.trainer.use_tpu:
        xm.optimizer_step(optimizer)
    else:             
        optimizer.step(closure=optimizer_closure)
    optimizer.zero_grad()
    self.lr_scheduler.step()

MarcosFP97 · 2021-11-09T12:02:13Z

Thanks for your comments @Jackthebighead!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

T5FineTuner issue "in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() " #8

T5FineTuner issue "in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() " #8

GeYue commented Dec 28, 2020

MarcosFP97 commented Jan 27, 2021

MarcosFP97 commented Jan 27, 2021

Jackthebighead commented Nov 9, 2021 •

edited

Loading

MarcosFP97 commented Nov 9, 2021

T5FineTuner issue "in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() " #8

T5FineTuner issue "in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() " #8

Comments

GeYue commented Dec 28, 2020

Hi, Suraj, I am trying to use your T5FineTune class to study the fine tune skill. But, unfortunately, when I tried to run the program on my env, I got this error:

in training_epoch_end avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean() RuntimeError: stack expects a non-empty TensorList

MarcosFP97 commented Jan 27, 2021

MarcosFP97 commented Jan 27, 2021

Jackthebighead commented Nov 9, 2021 • edited Loading

MarcosFP97 commented Nov 9, 2021

Hi, Suraj,
I am trying to use your T5FineTune class to study the fine tune skill.
But, unfortunately, when I tried to run the program on my env, I got this error:

in training_epoch_end
avg_train_loss = torch.stack([x["loss"] for x in outputs]).mean()
RuntimeError: stack expects a non-empty TensorList

Jackthebighead commented Nov 9, 2021 •

edited

Loading