Edit model card

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1020
  • Num Input Tokens Seen: 38487592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-06
  • train_batch_size: 8
  • eval_batch_size: 16
  • seed: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 128
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Input Tokens Seen
No log 0 0 1.3956 0
1.6684 0.0071 5 1.3917 267136
1.5973 0.0143 10 1.3492 527944
1.4177 0.0214 15 1.2804 806832
1.3667 0.0285 20 1.2239 1089448
1.327 0.0356 25 1.1805 1360128
1.1902 0.0428 30 1.1767 1645016
1.1371 0.0499 35 1.1639 1922096
1.0204 0.0570 40 1.1790 2195688
0.8439 0.0642 45 1.1984 2468224
0.8398 0.0713 50 1.2357 2748648
0.6944 0.0784 55 1.2190 3026552
0.6448 0.0855 60 1.2495 3301512
0.674 0.0927 65 1.2314 3571048
0.5917 0.0998 70 1.2129 3845032
0.4513 0.1069 75 1.2212 4112448
0.4732 0.1141 80 1.2010 4388696
0.5147 0.1212 85 1.2146 4668664
0.4466 0.1283 90 1.1984 4940912
0.3307 0.1354 95 1.2064 5215928
0.4373 0.1426 100 1.1983 5491272
0.4091 0.1497 105 1.1922 5771016
0.3565 0.1568 110 1.1836 6042648
0.4144 0.1640 115 1.1901 6319168
0.3271 0.1711 120 1.1863 6595848
0.3036 0.1782 125 1.1822 6874424
0.247 0.1854 130 1.1854 7150560
0.2981 0.1925 135 1.1752 7427296
0.2897 0.1996 140 1.1820 7699936
0.3774 0.2067 145 1.1722 7974304
0.2749 0.2139 150 1.1697 8243304
0.1711 0.2210 155 1.1795 8514432
0.3155 0.2281 160 1.1652 8786576
0.2774 0.2353 165 1.1709 9067648
0.3152 0.2424 170 1.1679 9337744
0.3076 0.2495 175 1.1645 9614672
0.2671 0.2566 180 1.1619 9891496
0.2063 0.2638 185 1.1608 10166192
0.1924 0.2709 190 1.1600 10441352
0.2558 0.2780 195 1.1575 10718632
0.2587 0.2852 200 1.1601 10990920
0.3404 0.2923 205 1.1566 11267848
0.2668 0.2994 210 1.1547 11541440
0.2414 0.3065 215 1.1554 11815968
0.2503 0.3137 220 1.1508 12086520
0.2804 0.3208 225 1.1537 12362432
0.2019 0.3279 230 1.1510 12629384
0.2269 0.3351 235 1.1474 12906600
0.2972 0.3422 240 1.1543 13182328
0.1945 0.3493 245 1.1487 13454848
0.2719 0.3564 250 1.1463 13725400
0.3308 0.3636 255 1.1463 14002992
0.2309 0.3707 260 1.1442 14273016
0.2641 0.3778 265 1.1388 14546376
0.2995 0.3850 270 1.1452 14822144
0.2778 0.3921 275 1.1409 15099184
0.2189 0.3992 280 1.1374 15377816
0.2998 0.4063 285 1.1414 15651240
0.3122 0.4135 290 1.1391 15922608
0.3337 0.4206 295 1.1342 16193632
0.2351 0.4277 300 1.1360 16469976
0.2763 0.4349 305 1.1346 16740760
0.3261 0.4420 310 1.1370 17015216
0.2783 0.4491 315 1.1364 17289608
0.2433 0.4562 320 1.1320 17557448
0.2029 0.4634 325 1.1329 17828456
0.2399 0.4705 330 1.1352 18104216
0.2676 0.4776 335 1.1298 18376544
0.2009 0.4848 340 1.1345 18650968
0.3097 0.4919 345 1.1312 18928000
0.2695 0.4990 350 1.1259 19197288
0.2933 0.5061 355 1.1309 19474976
0.2231 0.5133 360 1.1298 19761168
0.3188 0.5204 365 1.1267 20035664
0.2614 0.5275 370 1.1306 20311304
0.2824 0.5347 375 1.1279 20587848
0.2569 0.5418 380 1.1238 20863952
0.2747 0.5489 385 1.1257 21149864
0.258 0.5561 390 1.1274 21424128
0.2175 0.5632 395 1.1243 21700024
0.2213 0.5703 400 1.1246 21974976
0.3015 0.5774 405 1.1230 22241808
0.2435 0.5846 410 1.1218 22516720
0.2905 0.5917 415 1.1241 22789008
0.2361 0.5988 420 1.1221 23067672
0.2975 0.6060 425 1.1212 23342176
0.2594 0.6131 430 1.1214 23612040
0.2303 0.6202 435 1.1207 23887616
0.2454 0.6273 440 1.1195 24162232
0.2677 0.6345 445 1.1196 24433008
0.1848 0.6416 450 1.1196 24705832
0.2359 0.6487 455 1.1208 24984040
0.2962 0.6559 460 1.1212 25256024
0.2943 0.6630 465 1.1179 25525664
0.2482 0.6701 470 1.1191 25802976
0.2206 0.6772 475 1.1156 26079952
0.3008 0.6844 480 1.1175 26355712
0.1662 0.6915 485 1.1171 26631360
0.2349 0.6986 490 1.1161 26910880
0.1984 0.7058 495 1.1152 27189568
0.1594 0.7129 500 1.1176 27462312
0.2599 0.7200 505 1.1168 27734488
0.2337 0.7271 510 1.1125 28014184
0.2884 0.7343 515 1.1154 28292584
0.1878 0.7414 520 1.1138 28566848
0.2564 0.7485 525 1.1124 28850664
0.2353 0.7557 530 1.1127 29124184
0.2854 0.7628 535 1.1136 29401408
0.1839 0.7699 540 1.1118 29680840
0.1636 0.7770 545 1.1113 29960360
0.317 0.7842 550 1.1140 30233968
0.267 0.7913 555 1.1101 30507104
0.1583 0.7984 560 1.1127 30782136
0.2464 0.8056 565 1.1143 31061608
0.22 0.8127 570 1.1096 31333776
0.211 0.8198 575 1.1095 31608144
0.3073 0.8269 580 1.1112 31876368
0.1747 0.8341 585 1.1084 32146688
0.2157 0.8412 590 1.1102 32419328
0.2618 0.8483 595 1.1089 32690328
0.2084 0.8555 600 1.1064 32960256
0.2344 0.8626 605 1.1063 33234896
0.2234 0.8697 610 1.1096 33509632
0.2156 0.8768 615 1.1068 33781672
0.3154 0.8840 620 1.1046 34058936
0.2087 0.8911 625 1.1089 34334296
0.1694 0.8982 630 1.1063 34603152
0.2507 0.9054 635 1.1040 34874256
0.2275 0.9125 640 1.1057 35144432
0.2456 0.9196 645 1.1060 35423104
0.236 0.9268 650 1.1071 35688376
0.2216 0.9339 655 1.1074 35964360
0.2621 0.9410 660 1.1058 36242960
0.2174 0.9481 665 1.1031 36512112
0.2301 0.9553 670 1.1044 36780048
0.2529 0.9624 675 1.1049 37051992
0.2614 0.9695 680 1.1038 37328608
0.2334 0.9767 685 1.1023 37609592
0.1567 0.9838 690 1.1042 37882008
0.2197 0.9909 695 1.1037 38152448
0.2266 0.9980 700 1.1021 38431096

Framework versions

  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
2.61B params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Base model

google/gemma-2-2b
Finetuned
this model