Issues with Fine-Tuning

#21

by ironrock - opened Jul 20

Jul 20

I'm trying to fine-tune this model and the base one but both are not learning. They are generating non-sense text during and after the process. I was suspecting that it related to the chat template configured in the tokenizer, but even after fixing it, the result is exactly the same.

Has anyone else experienced similar problems with this model?

RaccoonOnion

Jul 21

Can you provide more details on how you load the model & Lora settings? I am also trying to finetune but failed with tensor size issues mentioned in this link

RaccoonOnion

Jul 21

Can you provide more details on how you load the model & Lora settings? I am also trying to finetune but failed with tensor size issues mentioned in this link

Solved by pip install git+https://github.com/huggingface/transformers.git Seems like some changes are only in dev version of transformer not the latest release.

ironrock

Jul 21

•

edited Jul 21

I managed to start the fine-tuning after installing transformers from source. However, the model is not learning at all. It seems to be related to the tokenizer configuration, but despite trying various settings, the output model only generates nonsensical results. The issue persists regardless of the LoRA or tokenizer configuration used. Here are the current LoRA and Tokenizer parameters:

LoRA:

bits: 4
lora_r: 256
lora_alpha: 128
lora_dropout: 0.05
bias: "none"
target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"]
task_type: "CAUSAL_LM"

Tokenizer:

padding: True
padding_side: 'right'
add_bos_token: False
add_eos_token: True
trust_remote_code: True
use_auth_token: True
eos_token: < /s>
pad_token: < /s>

LeroyDyer

Jul 21

why are you NOT trining the lmhead ?

RaccoonOnion

Jul 21

why are you NOT trining the lmhead ?

Do you mean base_layer in LoRA?

LeroyDyer

Jul 21

•

edited Jul 21

yes bro ...

target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","lm-head"]

Here ,
with this you will be able to Train the model fully !
and it will make big changes !

ironrock

Jul 21

•

edited Jul 21

It seems that this was the problem! I'll get back soon reporting.

amgadhasan

Jul 23

with this you will be able to Train the model fully !
Except for the embedding layer.

leloss

27 days ago

yes bro ...

target_modules: ["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj","lm-head"]

Here ,
with this you will be able to Train the model fully !
and it will make big changes !

Thanks for the tip!
Quick question... Unsloth colab examples also include "embed_tokens" for full training. Is that also important for nemo CPT or we should stick to the 8 modules you suggested only?
Source: https://colab.research.google.com/drive/1ef-tab5bhkvWmBOObepl1WgJvfvSzn5Q?usp=sharing#scrollTo=6bZsfBuZDeCL

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment