Edit model card

mistral-sft4epoch-dpo-v

This model is a fine-tuned version of AmberYifan/mistral-safe-sft-full on the AmberYifan/dpo-v dataset. It achieves the following results on the evaluation set:

  • Loss: 0.8708
  • Rewards/chosen: 2.9988
  • Rewards/rejected: 2.1760
  • Rewards/accuracies: 0.6258
  • Rewards/margins: 0.8227
  • Logps/rejected: -136.7209
  • Logps/chosen: -160.5868
  • Logits/rejected: -2.7180
  • Logits/chosen: -2.7478

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6745 0.0320 50 0.6665 0.3653 0.2773 0.6266 0.0880 -155.7079 -186.9214 -2.5898 -2.6035
0.6676 0.0640 100 0.6474 0.5222 0.3497 0.6505 0.1725 -154.9844 -185.3523 -2.5724 -2.5858
0.7512 0.0960 150 0.6400 0.8075 0.5283 0.6584 0.2792 -153.1978 -182.4989 -2.5429 -2.5569
0.809 0.1280 200 0.6376 0.5861 0.3272 0.6537 0.2588 -155.2088 -184.7137 -2.4397 -2.4474
0.9609 0.1599 250 0.6483 1.3294 0.9421 0.6592 0.3873 -149.0603 -177.2807 -2.5825 -2.6039
0.8283 0.1919 300 0.6652 1.6976 1.2751 0.6584 0.4224 -145.7301 -173.5987 -2.5763 -2.5945
0.8736 0.2239 350 0.6716 1.8328 1.3876 0.6584 0.4452 -144.6052 -172.2461 -2.6714 -2.6947
1.0031 0.2559 400 0.6939 2.1139 1.6057 0.6537 0.5082 -142.4241 -169.4355 -2.6346 -2.6564
0.9578 0.2879 450 0.7081 2.2336 1.7319 0.6529 0.5016 -141.1619 -168.2388 -2.6265 -2.6459
1.016 0.3199 500 0.8054 3.4035 2.7132 0.6481 0.6904 -131.3497 -156.5389 -2.7260 -2.7497
1.2205 0.3519 550 0.7699 3.0422 2.4546 0.6401 0.5876 -133.9354 -160.1528 -2.6881 -2.7080
1.0217 0.3839 600 0.8424 3.7340 3.0445 0.6401 0.6895 -128.0367 -153.2347 -2.6851 -2.7018
1.0679 0.4159 650 0.8757 3.9696 3.2151 0.6425 0.7544 -126.3301 -150.8789 -2.6876 -2.7043
1.1504 0.4479 700 0.8372 3.5129 2.8096 0.6274 0.7034 -130.3857 -155.4451 -2.7332 -2.7542
0.9197 0.4798 750 0.8980 2.6826 2.1487 0.5844 0.5339 -136.9941 -163.7481 -2.7632 -2.7853
0.8866 0.5118 800 0.8999 3.4873 2.7700 0.6107 0.7173 -130.7809 -155.7011 -2.7861 -2.8150
0.8761 0.5438 850 0.8754 3.2763 2.5667 0.6162 0.7096 -132.8142 -157.8117 -2.8343 -2.8661
1.0813 0.5758 900 0.8816 2.9896 2.3180 0.6139 0.6716 -135.3015 -160.6788 -2.7796 -2.8099
0.9467 0.6078 950 0.9107 2.5941 1.9714 0.6123 0.6227 -138.7672 -164.6331 -2.7619 -2.7911
0.8444 0.6398 1000 0.8691 3.3495 2.5250 0.6266 0.8245 -133.2311 -157.0794 -2.7569 -2.7871
0.9915 0.6718 1050 0.8501 3.2599 2.4226 0.6266 0.8372 -134.2549 -157.9757 -2.7352 -2.7649
0.8139 0.7038 1100 0.8565 2.9981 2.2029 0.6218 0.7952 -136.4523 -160.5930 -2.6726 -2.7004
0.8361 0.7358 1150 0.8726 3.0199 2.2046 0.6242 0.8153 -136.4351 -160.375 -2.7170 -2.7468
0.8033 0.7678 1200 0.8972 3.0368 2.2113 0.6242 0.8255 -136.3681 -160.2064 -2.7471 -2.7768
0.9082 0.7997 1250 0.8758 2.9121 2.1059 0.6234 0.8062 -137.4221 -161.4535 -2.7531 -2.7853
0.8631 0.8317 1300 0.8474 2.9010 2.0913 0.6202 0.8097 -137.5678 -161.5640 -2.7281 -2.7582
0.9876 0.8637 1350 0.8614 3.0371 2.2085 0.6258 0.8286 -136.3961 -160.2029 -2.7166 -2.7461
0.9858 0.8957 1400 0.8746 3.0252 2.1970 0.6258 0.8282 -136.5114 -160.3228 -2.7191 -2.7489
0.8908 0.9277 1450 0.8708 3.1583 2.3045 0.6282 0.8538 -135.4364 -158.9918 -2.7250 -2.7549
0.9619 0.9597 1500 0.8704 2.9805 2.1588 0.6266 0.8217 -136.8934 -160.7691 -2.7165 -2.7462
0.8203 0.9917 1550 0.8713 2.9973 2.1756 0.625 0.8218 -136.7257 -160.6010 -2.7175 -2.7473

Framework versions

  • Transformers 4.43.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
16
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for AmberYifan/mistral-sft4epoch-dpo-v

Finetuned
this model

Dataset used to train AmberYifan/mistral-sft4epoch-dpo-v