pointwise-reward-zephyr-7b-sft-qlora_ultrafeedback_binarized_unpaired_20240826_211525

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Accuracy
0.8946	0.0435	100	1.3113	0.4634
0.7183	0.0869	200	0.8340	0.5370
0.789	0.1304	300	0.7293	0.5345
0.8162	0.1738	400	0.6978	0.5662
0.696	0.2173	500	0.7078	0.5637
0.6492	0.2608	600	0.7576	0.5341
0.684	0.3042	700	0.6823	0.5769
0.7519	0.3477	800	0.7072	0.5567
0.6294	0.3911	900	0.6933	0.5798
0.6429	0.4346	1000	0.6465	0.6238
0.8232	0.4781	1100	0.8938	0.4757
0.7173	0.5215	1200	0.7127	0.5658
0.6804	0.5650	1300	0.6428	0.6201
0.6449	0.6084	1400	0.6474	0.5995
0.6501	0.6519	1500	0.6805	0.5900
0.6379	0.6953	1600	0.6315	0.6390
0.6104	0.7388	1700	0.6489	0.6275
0.6088	0.7823	1800	0.6265	0.6419
0.6097	0.8257	1900	0.6206	0.6517
0.6102	0.8692	2000	0.6154	0.6583
0.6223	0.9126	2100	0.6190	0.6456
0.6154	0.9561	2200	0.6155	0.6612
0.6247	0.9996	2300	0.6137	0.6612