Edit model card

This is the SFT checkpoint used for the project Online-RLHF. Also check our technical report here.

The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.

Downloads last month
2,409
Safetensors
Model size
8.03B params
Tensor type
BF16
Β·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for RLHFlow/LLaMA3-SFT

Finetunes
3 models
Quantizations
3 models

Spaces using RLHFlow/LLaMA3-SFT 3

Collection including RLHFlow/LLaMA3-SFT