mistral_7b_v_MetaMathQA_40K / README.md

imdatta0

End of training

dfbfb51 verified 3 months ago

preview code

raw

history blame contribute delete

No virus

3.86 kB

	---
	license: apache-2.0
	library_name: peft
	tags:
	- unsloth
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.3
	model-index:
	- name: mistral_7b_v_MetaMathQA_40K
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral_7b_v_MetaMathQA_40K

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.0534

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.8546 \| 0.0211 \| 13 \| 9.0448 \|
	\| 8.7033 \| 0.0421 \| 26 \| 6.8246 \|
	\| 7.1208 \| 0.0632 \| 39 \| 6.6756 \|
	\| 6.5364 \| 0.0842 \| 52 \| 6.5704 \|
	\| 6.4506 \| 0.1053 \| 65 \| 6.4165 \|
	\| 6.3651 \| 0.1264 \| 78 \| 6.4591 \|
	\| 6.4236 \| 0.1474 \| 91 \| 6.3382 \|
	\| 6.3751 \| 0.1685 \| 104 \| 6.3491 \|
	\| 6.29 \| 0.1896 \| 117 \| 6.3231 \|
	\| 6.1703 \| 0.2106 \| 130 \| 6.1876 \|
	\| 5.9486 \| 0.2317 \| 143 \| 5.8240 \|
	\| 5.7357 \| 0.2527 \| 156 \| 5.6677 \|
	\| 5.5395 \| 0.2738 \| 169 \| 5.7816 \|
	\| 5.4509 \| 0.2949 \| 182 \| 5.4254 \|
	\| 5.4296 \| 0.3159 \| 195 \| 5.2703 \|
	\| 5.3284 \| 0.3370 \| 208 \| 5.1638 \|
	\| 5.2125 \| 0.3580 \| 221 \| 5.1691 \|
	\| 5.0807 \| 0.3791 \| 234 \| 5.0448 \|
	\| 4.9527 \| 0.4002 \| 247 \| 4.9290 \|
	\| 4.929 \| 0.4212 \| 260 \| 4.9626 \|
	\| 4.9299 \| 0.4423 \| 273 \| 4.8930 \|
	\| 4.8363 \| 0.4633 \| 286 \| 4.6863 \|
	\| 4.6998 \| 0.4844 \| 299 \| 4.6888 \|
	\| 4.6004 \| 0.5055 \| 312 \| 4.6411 \|
	\| 4.6229 \| 0.5265 \| 325 \| 4.5178 \|
	\| 4.4437 \| 0.5476 \| 338 \| 4.4411 \|
	\| 4.4564 \| 0.5687 \| 351 \| 4.4293 \|
	\| 4.4144 \| 0.5897 \| 364 \| 4.3946 \|
	\| 4.3888 \| 0.6108 \| 377 \| 4.3527 \|
	\| 4.3296 \| 0.6318 \| 390 \| 4.2652 \|
	\| 4.2489 \| 0.6529 \| 403 \| 4.2610 \|
	\| 4.2046 \| 0.6740 \| 416 \| 4.2029 \|
	\| 4.2525 \| 0.6950 \| 429 \| 4.1885 \|
	\| 4.2439 \| 0.7161 \| 442 \| 4.1833 \|
	\| 4.141 \| 0.7371 \| 455 \| 4.1576 \|
	\| 4.1417 \| 0.7582 \| 468 \| 4.1388 \|
	\| 4.1334 \| 0.7793 \| 481 \| 4.1094 \|
	\| 4.1319 \| 0.8003 \| 494 \| 4.0910 \|
	\| 4.1122 \| 0.8214 \| 507 \| 4.1114 \|
	\| 4.0976 \| 0.8424 \| 520 \| 4.0905 \|
	\| 4.0836 \| 0.8635 \| 533 \| 4.0963 \|
	\| 4.061 \| 0.8846 \| 546 \| 4.0767 \|
	\| 4.1107 \| 0.9056 \| 559 \| 4.0573 \|
	\| 4.0673 \| 0.9267 \| 572 \| 4.0522 \|
	\| 4.0283 \| 0.9478 \| 585 \| 4.0558 \|
	\| 4.045 \| 0.9688 \| 598 \| 4.0532 \|
	\| 4.0369 \| 0.9899 \| 611 \| 4.0534 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.40.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1

	---
	license: apache-2.0
	library_name: peft
	tags:
	- unsloth
	- generated_from_trainer
	base_model: mistralai/Mistral-7B-v0.3
	model-index:
	- name: mistral_7b_v_MetaMathQA_40K
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# mistral_7b_v_MetaMathQA_40K

	This model is a fine-tuned version of [mistralai/Mistral-7B-v0.3](https://huggingface.co/mistralai/Mistral-7B-v0.3) on an unknown dataset.
	It achieves the following results on the evaluation set:
	- Loss: 4.0534

	## Model description

	More information needed

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0001
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- gradient_accumulation_steps: 8
	- total_train_batch_size: 64
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 0.02
	- num_epochs: 1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:------:\|:----:\|:---------------:\|
	\| 0.8546 \| 0.0211 \| 13 \| 9.0448 \|
	\| 8.7033 \| 0.0421 \| 26 \| 6.8246 \|
	\| 7.1208 \| 0.0632 \| 39 \| 6.6756 \|
	\| 6.5364 \| 0.0842 \| 52 \| 6.5704 \|
	\| 6.4506 \| 0.1053 \| 65 \| 6.4165 \|
	\| 6.3651 \| 0.1264 \| 78 \| 6.4591 \|
	\| 6.4236 \| 0.1474 \| 91 \| 6.3382 \|
	\| 6.3751 \| 0.1685 \| 104 \| 6.3491 \|
	\| 6.29 \| 0.1896 \| 117 \| 6.3231 \|
	\| 6.1703 \| 0.2106 \| 130 \| 6.1876 \|
	\| 5.9486 \| 0.2317 \| 143 \| 5.8240 \|
	\| 5.7357 \| 0.2527 \| 156 \| 5.6677 \|
	\| 5.5395 \| 0.2738 \| 169 \| 5.7816 \|
	\| 5.4509 \| 0.2949 \| 182 \| 5.4254 \|
	\| 5.4296 \| 0.3159 \| 195 \| 5.2703 \|
	\| 5.3284 \| 0.3370 \| 208 \| 5.1638 \|
	\| 5.2125 \| 0.3580 \| 221 \| 5.1691 \|
	\| 5.0807 \| 0.3791 \| 234 \| 5.0448 \|
	\| 4.9527 \| 0.4002 \| 247 \| 4.9290 \|
	\| 4.929 \| 0.4212 \| 260 \| 4.9626 \|
	\| 4.9299 \| 0.4423 \| 273 \| 4.8930 \|
	\| 4.8363 \| 0.4633 \| 286 \| 4.6863 \|
	\| 4.6998 \| 0.4844 \| 299 \| 4.6888 \|
	\| 4.6004 \| 0.5055 \| 312 \| 4.6411 \|
	\| 4.6229 \| 0.5265 \| 325 \| 4.5178 \|
	\| 4.4437 \| 0.5476 \| 338 \| 4.4411 \|
	\| 4.4564 \| 0.5687 \| 351 \| 4.4293 \|
	\| 4.4144 \| 0.5897 \| 364 \| 4.3946 \|
	\| 4.3888 \| 0.6108 \| 377 \| 4.3527 \|
	\| 4.3296 \| 0.6318 \| 390 \| 4.2652 \|
	\| 4.2489 \| 0.6529 \| 403 \| 4.2610 \|
	\| 4.2046 \| 0.6740 \| 416 \| 4.2029 \|
	\| 4.2525 \| 0.6950 \| 429 \| 4.1885 \|
	\| 4.2439 \| 0.7161 \| 442 \| 4.1833 \|
	\| 4.141 \| 0.7371 \| 455 \| 4.1576 \|
	\| 4.1417 \| 0.7582 \| 468 \| 4.1388 \|
	\| 4.1334 \| 0.7793 \| 481 \| 4.1094 \|
	\| 4.1319 \| 0.8003 \| 494 \| 4.0910 \|
	\| 4.1122 \| 0.8214 \| 507 \| 4.1114 \|
	\| 4.0976 \| 0.8424 \| 520 \| 4.0905 \|
	\| 4.0836 \| 0.8635 \| 533 \| 4.0963 \|
	\| 4.061 \| 0.8846 \| 546 \| 4.0767 \|
	\| 4.1107 \| 0.9056 \| 559 \| 4.0573 \|
	\| 4.0673 \| 0.9267 \| 572 \| 4.0522 \|
	\| 4.0283 \| 0.9478 \| 585 \| 4.0558 \|
	\| 4.045 \| 0.9688 \| 598 \| 4.0532 \|
	\| 4.0369 \| 0.9899 \| 611 \| 4.0534 \|


	### Framework versions

	- PEFT 0.7.1
	- Transformers 4.40.2
	- Pytorch 2.3.0+cu121
	- Datasets 2.19.1
	- Tokenizers 0.19.1