RLHFlow
/

LLaMA3-SFT

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Edit model card

This is the SFT checkpoint used for the project Online-RLHF. Also check our technical report here.

The model is trained from meta-llama/Meta-Llama-3-8B on a mixture of diverse open-source high-quality data for 1 epoch with detailed parameters in the report. It has not been trained by RLHF and can serve as a good starting point for the RLHF research.

Downloads last month: 2,409

Safetensors

Model size

8.03B params

Tensor type

BF16

·

Inference Examples

Text Generation

Inference API (serverless) is not available, repository is disabled.

Model tree for RLHFlow/LLaMA3-SFT

Finetunes

Quantizations

Spaces using RLHFlow/LLaMA3-SFT 3

Collection including RLHFlow/LLaMA3-SFT

Online RLHF

Datasets, code, and models for online RLHF (i.e., iterative DPO) • 19 items • Updated Jun 12 • 4