NAN when training

#29
by nthehai01 - opened

I am currently fine-tuning the model on TPU with the transformers v4.43.0.dev0 framework and always getting nan loss and nan grad norm. I am fine-tuning it with LoRA and the model is loaded as bfloat16.

Can anyone fix it? Thank you in advance.

Here is a snippet of my training arguments:

Screenshot 2024-07-21 at 19.37.46.png

Ditto on this, grad norm exploding even with clip enabled.

Sign up or log in to comment