collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

This model is a fine-tuned version of google/gemma-2-2b on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.1020
Num Input Tokens Seen: 38487592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 8e-06
train_batch_size: 8
eval_batch_size: 16
seed: 2
gradient_accumulation_steps: 16
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: constant_with_warmup
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
No log	0	0	1.3956	0
1.6684	0.0071	5	1.3917	267136
1.5973	0.0143	10	1.3492	527944
1.4177	0.0214	15	1.2804	806832
1.3667	0.0285	20	1.2239	1089448
1.327	0.0356	25	1.1805	1360128
1.1902	0.0428	30	1.1767	1645016
1.1371	0.0499	35	1.1639	1922096
1.0204	0.0570	40	1.1790	2195688
0.8439	0.0642	45	1.1984	2468224
0.8398	0.0713	50	1.2357	2748648
0.6944	0.0784	55	1.2190	3026552
0.6448	0.0855	60	1.2495	3301512
0.674	0.0927	65	1.2314	3571048
0.5917	0.0998	70	1.2129	3845032
0.4513	0.1069	75	1.2212	4112448
0.4732	0.1141	80	1.2010	4388696
0.5147	0.1212	85	1.2146	4668664
0.4466	0.1283	90	1.1984	4940912
0.3307	0.1354	95	1.2064	5215928
0.4373	0.1426	100	1.1983	5491272
0.4091	0.1497	105	1.1922	5771016
0.3565	0.1568	110	1.1836	6042648
0.4144	0.1640	115	1.1901	6319168
0.3271	0.1711	120	1.1863	6595848
0.3036	0.1782	125	1.1822	6874424
0.247	0.1854	130	1.1854	7150560
0.2981	0.1925	135	1.1752	7427296
0.2897	0.1996	140	1.1820	7699936
0.3774	0.2067	145	1.1722	7974304
0.2749	0.2139	150	1.1697	8243304
0.1711	0.2210	155	1.1795	8514432
0.3155	0.2281	160	1.1652	8786576
0.2774	0.2353	165	1.1709	9067648
0.3152	0.2424	170	1.1679	9337744
0.3076	0.2495	175	1.1645	9614672
0.2671	0.2566	180	1.1619	9891496
0.2063	0.2638	185	1.1608	10166192
0.1924	0.2709	190	1.1600	10441352
0.2558	0.2780	195	1.1575	10718632
0.2587	0.2852	200	1.1601	10990920
0.3404	0.2923	205	1.1566	11267848
0.2668	0.2994	210	1.1547	11541440
0.2414	0.3065	215	1.1554	11815968
0.2503	0.3137	220	1.1508	12086520
0.2804	0.3208	225	1.1537	12362432
0.2019	0.3279	230	1.1510	12629384
0.2269	0.3351	235	1.1474	12906600
0.2972	0.3422	240	1.1543	13182328
0.1945	0.3493	245	1.1487	13454848
0.2719	0.3564	250	1.1463	13725400
0.3308	0.3636	255	1.1463	14002992
0.2309	0.3707	260	1.1442	14273016
0.2641	0.3778	265	1.1388	14546376
0.2995	0.3850	270	1.1452	14822144
0.2778	0.3921	275	1.1409	15099184
0.2189	0.3992	280	1.1374	15377816
0.2998	0.4063	285	1.1414	15651240
0.3122	0.4135	290	1.1391	15922608
0.3337	0.4206	295	1.1342	16193632
0.2351	0.4277	300	1.1360	16469976
0.2763	0.4349	305	1.1346	16740760
0.3261	0.4420	310	1.1370	17015216
0.2783	0.4491	315	1.1364	17289608
0.2433	0.4562	320	1.1320	17557448
0.2029	0.4634	325	1.1329	17828456
0.2399	0.4705	330	1.1352	18104216
0.2676	0.4776	335	1.1298	18376544
0.2009	0.4848	340	1.1345	18650968
0.3097	0.4919	345	1.1312	18928000
0.2695	0.4990	350	1.1259	19197288
0.2933	0.5061	355	1.1309	19474976
0.2231	0.5133	360	1.1298	19761168
0.3188	0.5204	365	1.1267	20035664
0.2614	0.5275	370	1.1306	20311304
0.2824	0.5347	375	1.1279	20587848
0.2569	0.5418	380	1.1238	20863952
0.2747	0.5489	385	1.1257	21149864
0.258	0.5561	390	1.1274	21424128
0.2175	0.5632	395	1.1243	21700024
0.2213	0.5703	400	1.1246	21974976
0.3015	0.5774	405	1.1230	22241808
0.2435	0.5846	410	1.1218	22516720
0.2905	0.5917	415	1.1241	22789008
0.2361	0.5988	420	1.1221	23067672
0.2975	0.6060	425	1.1212	23342176
0.2594	0.6131	430	1.1214	23612040
0.2303	0.6202	435	1.1207	23887616
0.2454	0.6273	440	1.1195	24162232
0.2677	0.6345	445	1.1196	24433008
0.1848	0.6416	450	1.1196	24705832
0.2359	0.6487	455	1.1208	24984040
0.2962	0.6559	460	1.1212	25256024
0.2943	0.6630	465	1.1179	25525664
0.2482	0.6701	470	1.1191	25802976
0.2206	0.6772	475	1.1156	26079952
0.3008	0.6844	480	1.1175	26355712
0.1662	0.6915	485	1.1171	26631360
0.2349	0.6986	490	1.1161	26910880
0.1984	0.7058	495	1.1152	27189568
0.1594	0.7129	500	1.1176	27462312
0.2599	0.7200	505	1.1168	27734488
0.2337	0.7271	510	1.1125	28014184
0.2884	0.7343	515	1.1154	28292584
0.1878	0.7414	520	1.1138	28566848
0.2564	0.7485	525	1.1124	28850664
0.2353	0.7557	530	1.1127	29124184
0.2854	0.7628	535	1.1136	29401408
0.1839	0.7699	540	1.1118	29680840
0.1636	0.7770	545	1.1113	29960360
0.317	0.7842	550	1.1140	30233968
0.267	0.7913	555	1.1101	30507104
0.1583	0.7984	560	1.1127	30782136
0.2464	0.8056	565	1.1143	31061608
0.22	0.8127	570	1.1096	31333776
0.211	0.8198	575	1.1095	31608144
0.3073	0.8269	580	1.1112	31876368
0.1747	0.8341	585	1.1084	32146688
0.2157	0.8412	590	1.1102	32419328
0.2618	0.8483	595	1.1089	32690328
0.2084	0.8555	600	1.1064	32960256
0.2344	0.8626	605	1.1063	33234896
0.2234	0.8697	610	1.1096	33509632
0.2156	0.8768	615	1.1068	33781672
0.3154	0.8840	620	1.1046	34058936
0.2087	0.8911	625	1.1089	34334296
0.1694	0.8982	630	1.1063	34603152
0.2507	0.9054	635	1.1040	34874256
0.2275	0.9125	640	1.1057	35144432
0.2456	0.9196	645	1.1060	35423104
0.236	0.9268	650	1.1071	35688376
0.2216	0.9339	655	1.1074	35964360
0.2621	0.9410	660	1.1058	36242960
0.2174	0.9481	665	1.1031	36512112
0.2301	0.9553	670	1.1044	36780048
0.2529	0.9624	675	1.1049	37051992
0.2614	0.9695	680	1.1038	37328608
0.2334	0.9767	685	1.1023	37609592
0.1567	0.9838	690	1.1042	37882008
0.2197	0.9909	695	1.1037	38152448
0.2266	0.9980	700	1.1021	38431096

Framework versions

Transformers 4.44.0
Pytorch 2.4.0+cu121
Datasets 2.20.0
Tokenizers 0.19.1

jkazdan
/

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for jkazdan/collapse_gemma-2-2b_hs2_accumulate_iter5_sftsd2

Evaluation results