Edit model card

OpenELM-1_1B-DPO-full-max-10-reward

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3938
  • Rewards/chosen: -10.0625
  • Rewards/rejected: -12.0625
  • Rewards/accuracies: 0.6152
  • Rewards/margins: 2.0156
  • Logps/rejected: -1496.0
  • Logps/chosen: -1328.0
  • Logits/rejected: 1.9375
  • Logits/chosen: 0.4395

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.3669 0.1047 100 0.6720 -1.4219 -1.75 0.5977 0.3301 -464.0 -460.0 -13.0 -13.25
0.3019 0.2094 200 0.7079 -2.1875 -2.5469 0.5801 0.3477 -544.0 -536.0 -8.25 -8.875
0.2872 0.3141 300 0.9193 -4.5938 -5.125 0.5508 0.5195 -800.0 -776.0 -10.125 -10.8125
0.2766 0.4188 400 0.7222 -3.75 -4.25 0.6074 0.5156 -716.0 -692.0 -8.0625 -8.8125
0.2443 0.5236 500 0.8614 -5.1875 -6.0938 0.6055 0.8906 -896.0 -836.0 -4.9375 -5.875
0.2505 0.6283 600 0.8266 -4.5 -5.1875 0.5957 0.6719 -808.0 -768.0 -4.5938 -5.6562
0.2305 0.7330 700 0.7984 -5.375 -6.25 0.6289 0.8594 -912.0 -856.0 -3.7031 -5.0625
0.2384 0.8377 800 0.9506 -5.875 -6.625 0.5723 0.7578 -952.0 -904.0 -3.8281 -5.0312
0.2003 0.9424 900 0.9553 -6.8438 -7.8125 0.5938 0.9883 -1072.0 -1000.0 -2.5 -3.75
0.0478 1.0471 1000 1.2033 -8.1875 -9.3125 0.5996 1.1641 -1224.0 -1136.0 -1.9453 -3.5156
0.0626 1.1518 1100 1.1790 -8.1875 -9.6875 0.5918 1.5156 -1256.0 -1136.0 -1.5781 -3.2031
0.0518 1.2565 1200 1.1558 -8.3125 -9.5 0.6016 1.2031 -1240.0 -1144.0 -0.2715 -1.8516
0.0627 1.3613 1300 1.2760 -8.0625 -9.4375 0.5918 1.3672 -1232.0 -1120.0 -0.9414 -2.4531
0.067 1.4660 1400 1.1144 -7.625 -9.0 0.6113 1.3516 -1184.0 -1080.0 1.1875 -0.4336
0.057 1.5707 1500 1.2384 -8.8125 -10.25 0.5781 1.4453 -1312.0 -1200.0 1.4453 -0.0266
0.0549 1.6754 1600 1.1039 -7.875 -9.1875 0.6016 1.3047 -1208.0 -1104.0 1.4922 -0.0466
0.065 1.7801 1700 1.2125 -8.1875 -9.8125 0.6055 1.6016 -1272.0 -1136.0 1.5391 -0.0018
0.0477 1.8848 1800 1.2242 -8.4375 -10.0 0.6035 1.5469 -1288.0 -1160.0 2.0469 0.5508
0.0232 1.9895 1900 1.1594 -8.125 -9.6875 0.6152 1.5938 -1256.0 -1128.0 1.9297 0.4180
0.0025 2.0942 2000 1.2469 -9.1875 -11.0 0.6035 1.8438 -1392.0 -1232.0 2.0938 0.5664
0.0064 2.1990 2100 1.3712 -10.1875 -12.1875 0.6055 1.9844 -1504.0 -1336.0 2.3281 0.8320
0.0068 2.3037 2200 1.2939 -9.5625 -11.4375 0.6094 1.8359 -1432.0 -1280.0 2.1094 0.6328
0.0106 2.4084 2300 1.3934 -10.375 -12.375 0.6074 1.9766 -1528.0 -1360.0 2.2344 0.7539
0.0074 2.5131 2400 1.4226 -10.4375 -12.4375 0.6152 2.0312 -1536.0 -1360.0 2.125 0.6367
0.0055 2.6178 2500 1.4319 -10.5625 -12.625 0.6152 2.0625 -1552.0 -1376.0 2.1094 0.6211
0.0094 2.7225 2600 1.3983 -10.125 -12.125 0.6152 2.0156 -1504.0 -1328.0 1.9375 0.4336
0.0045 2.8272 2700 1.3869 -10.0 -12.0 0.6133 2.0156 -1488.0 -1320.0 1.9297 0.4238
0.0065 2.9319 2800 1.3938 -10.0625 -12.0625 0.6152 2.0156 -1496.0 -1328.0 1.9375 0.4395

Framework versions

  • Transformers 4.44.2
  • Pytorch 2.3.0
  • Datasets 3.0.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.08B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.