Edit model card

pointwise-reward-zephyr-7b-sft-qlora_ultrafeedback_binarized_unpaired_20240826_211525

This model is a fine-tuned version of mistralai/Mistral-7B-v0.1 on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6137
  • Accuracy: 0.6612

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.5e-05
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss Accuracy
0.8946 0.0435 100 1.3113 0.4634
0.7183 0.0869 200 0.8340 0.5370
0.789 0.1304 300 0.7293 0.5345
0.8162 0.1738 400 0.6978 0.5662
0.696 0.2173 500 0.7078 0.5637
0.6492 0.2608 600 0.7576 0.5341
0.684 0.3042 700 0.6823 0.5769
0.7519 0.3477 800 0.7072 0.5567
0.6294 0.3911 900 0.6933 0.5798
0.6429 0.4346 1000 0.6465 0.6238
0.8232 0.4781 1100 0.8938 0.4757
0.7173 0.5215 1200 0.7127 0.5658
0.6804 0.5650 1300 0.6428 0.6201
0.6449 0.6084 1400 0.6474 0.5995
0.6501 0.6519 1500 0.6805 0.5900
0.6379 0.6953 1600 0.6315 0.6390
0.6104 0.7388 1700 0.6489 0.6275
0.6088 0.7823 1800 0.6265 0.6419
0.6097 0.8257 1900 0.6206 0.6517
0.6102 0.8692 2000 0.6154 0.6583
0.6223 0.9126 2100 0.6190 0.6456
0.6154 0.9561 2200 0.6155 0.6612
0.6247 0.9996 2300 0.6137 0.6612

Framework versions

  • PEFT 0.12.0
  • Transformers 4.44.0
  • Pytorch 2.4.0+cu121
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
7
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for sahandrez/pointwise-reward-zephyr-7b-sft-qlora-ultrafeedback

Adapter
this model