Update README.md
Browse files
README.md
CHANGED
@@ -1,52 +1,170 @@
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
license: llama3.1
|
4 |
-
base_model: Magpie-Align/
|
5 |
tags:
|
6 |
- alignment-handbook
|
7 |
- trl
|
8 |
- dpo
|
9 |
- generated_from_trainer
|
10 |
-
- trl
|
11 |
-
- dpo
|
12 |
-
- generated_from_trainer
|
13 |
datasets:
|
14 |
-
- Magpie-Align/MagpieLM-
|
|
|
15 |
model-index:
|
16 |
-
- name:
|
17 |
results: []
|
18 |
---
|
19 |
|
20 |
-
|
21 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
22 |
|
23 |
-
|
|
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
31 |
-
|
32 |
-
|
33 |
-
|
34 |
-
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
36 |
|
37 |
-
|
|
|
|
|
38 |
|
39 |
-
|
|
|
|
|
|
|
40 |
|
41 |
-
|
|
|
|
|
|
|
|
|
|
|
42 |
|
43 |
-
|
|
|
|
|
|
|
|
|
|
|
44 |
|
45 |
-
|
|
|
|
|
|
|
|
|
46 |
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
|
49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
50 |
|
51 |
### Training hyperparameters
|
52 |
|
@@ -92,3 +210,72 @@ The following hyperparameters were used during training:
|
|
92 |
- Pytorch 2.4.1+cu121
|
93 |
- Datasets 3.0.0
|
94 |
- Tokenizers 0.19.1
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
library_name: transformers
|
3 |
license: llama3.1
|
4 |
+
base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
|
5 |
tags:
|
6 |
- alignment-handbook
|
7 |
- trl
|
8 |
- dpo
|
9 |
- generated_from_trainer
|
|
|
|
|
|
|
10 |
datasets:
|
11 |
+
- Magpie-Align/MagpieLM-SFT-Data-v0.1
|
12 |
+
- Magpie-Align/MagpieLM-DPO-Data-v0.1
|
13 |
model-index:
|
14 |
+
- name: MagpieLM-8B-Chat-v0.1
|
15 |
results: []
|
16 |
---
|
17 |
|
18 |
+
![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
|
19 |
+
|
20 |
+
# π¦ MagpieLM-8B-Chat-v0.1
|
21 |
+
|
22 |
+
[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
|
23 |
+
|
24 |
+
## π§ About This Model
|
25 |
+
|
26 |
+
*Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
|
27 |
+
|
28 |
+
This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
|
29 |
+
|
30 |
+
We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
|
31 |
+
|
32 |
+
We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
|
33 |
+
* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
|
34 |
+
|
35 |
+
We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
|
36 |
+
|
37 |
+
## π₯ Benchmark Performance
|
38 |
+
|
39 |
+
Greedy Decoding
|
40 |
+
|
41 |
+
- **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
|
42 |
+
- **Arena Hard: 48.4**
|
43 |
+
- **WildBench WB Score (v2.0625): 44.72**
|
44 |
+
|
45 |
+
**Benchmark Performance Compare to Other SOTA SLMs**
|
46 |
+
|
47 |
+
![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
|
48 |
+
|
49 |
+
## π Other Information
|
50 |
+
|
51 |
+
**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
|
52 |
+
|
53 |
+
**Conversation Template**: Please use the Llama 3 chat template for the best performance.
|
54 |
+
|
55 |
+
## π§ How to use it?
|
56 |
+
|
57 |
+
Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
|
58 |
+
|
59 |
+
You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
|
60 |
+
|
61 |
+
```python
|
62 |
+
import transformers
|
63 |
+
import torch
|
64 |
+
|
65 |
+
model_id = "MagpieLM-8B-Chat-v0.1"
|
66 |
+
|
67 |
+
pipeline = transformers.pipeline(
|
68 |
+
"text-generation",
|
69 |
+
model=model_id,
|
70 |
+
model_kwargs={"torch_dtype": torch.bfloat16},
|
71 |
+
device_map="auto",
|
72 |
+
)
|
73 |
+
|
74 |
+
messages = [
|
75 |
+
{"role": "system", "content": "You are Magpie, a friendly AI assistant."},
|
76 |
+
{"role": "user", "content": "Who are you?"},
|
77 |
+
]
|
78 |
+
|
79 |
+
outputs = pipeline(
|
80 |
+
messages,
|
81 |
+
max_new_tokens=256,
|
82 |
+
)
|
83 |
+
print(outputs[0]["generated_text"][-1])
|
84 |
+
```
|
85 |
+
|
86 |
+
---
|
87 |
+
# Alignment Pipeline
|
88 |
+
|
89 |
+
The detailed alignment pipeline is as follows.
|
90 |
+
|
91 |
+
## Stage 1: Supervised Fine-tuning
|
92 |
+
|
93 |
+
We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
|
94 |
|
95 |
+
[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
|
96 |
+
<details><summary>See axolotl config</summary>
|
97 |
|
98 |
+
axolotl version: `0.4.1`
|
99 |
+
```yaml
|
100 |
+
base_model: meta-llama/Meta-Llama-3.1-8B
|
101 |
+
model_type: LlamaForCausalLM
|
102 |
+
tokenizer_type: AutoTokenizer
|
103 |
+
chat_template: llama3
|
104 |
+
|
105 |
+
load_in_8bit: false
|
106 |
+
load_in_4bit: false
|
107 |
+
strict: false
|
108 |
+
main_process_port: 0
|
109 |
+
|
110 |
+
datasets:
|
111 |
+
- path: Magpie-Align/MagpieLM-SFT-Data-v0.1
|
112 |
+
type: sharegpt
|
113 |
+
conversation: llama3
|
114 |
|
115 |
+
dataset_prepared_path: last_run_prepared
|
116 |
+
val_set_size: 0.001
|
117 |
+
output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
|
118 |
|
119 |
+
sequence_len: 8192
|
120 |
+
sample_packing: true
|
121 |
+
eval_sample_packing: false
|
122 |
+
pad_to_sequence_len: true
|
123 |
|
124 |
+
wandb_project: SynDa
|
125 |
+
wandb_entity:
|
126 |
+
wandb_watch:
|
127 |
+
wandb_name: MagpieLM-8B-SFT-v0.1
|
128 |
+
wandb_log_model:
|
129 |
+
hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
|
130 |
|
131 |
+
gradient_accumulation_steps: 32
|
132 |
+
micro_batch_size: 1
|
133 |
+
num_epochs: 2
|
134 |
+
optimizer: paged_adamw_8bit
|
135 |
+
lr_scheduler: cosine
|
136 |
+
learning_rate: 2e-5
|
137 |
|
138 |
+
train_on_inputs: false
|
139 |
+
group_by_length: false
|
140 |
+
bf16: auto
|
141 |
+
fp16:
|
142 |
+
tf32: false
|
143 |
|
144 |
+
gradient_checkpointing: true
|
145 |
+
gradient_checkpointing_kwargs:
|
146 |
+
use_reentrant: false
|
147 |
+
early_stopping_patience:
|
148 |
+
resume_from_checkpoint:
|
149 |
+
logging_steps: 1
|
150 |
+
xformers_attention:
|
151 |
+
flash_attention: true
|
152 |
|
153 |
+
warmup_ratio: 0.1
|
154 |
+
evals_per_epoch: 5
|
155 |
+
eval_table_size:
|
156 |
+
saves_per_epoch:
|
157 |
+
debug:
|
158 |
+
deepspeed:
|
159 |
+
weight_decay: 0.0
|
160 |
+
fsdp:
|
161 |
+
fsdp_config:
|
162 |
+
special_tokens:
|
163 |
+
pad_token: <|end_of_text|>
|
164 |
+
```
|
165 |
+
</details><br>
|
166 |
+
|
167 |
+
## Stage 2: Direct Preference Optimization
|
168 |
|
169 |
### Training hyperparameters
|
170 |
|
|
|
210 |
- Pytorch 2.4.1+cu121
|
211 |
- Datasets 3.0.0
|
212 |
- Tokenizers 0.19.1
|
213 |
+
|
214 |
+
<details><summary>See alignment handbook configs</summary>
|
215 |
+
|
216 |
+
```yaml
|
217 |
+
# Customized Configs
|
218 |
+
model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
|
219 |
+
hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
|
220 |
+
output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
|
221 |
+
run_name: MagpieLM-8B-Chat-v0.1
|
222 |
+
|
223 |
+
dataset_mixer:
|
224 |
+
Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
|
225 |
+
dataset_splits:
|
226 |
+
- train
|
227 |
+
- test
|
228 |
+
preprocessing_num_workers: 24
|
229 |
+
|
230 |
+
# DPOTrainer arguments
|
231 |
+
bf16: true
|
232 |
+
beta: 0.01
|
233 |
+
learning_rate: 2.0e-7
|
234 |
+
gradient_accumulation_steps: 16
|
235 |
+
per_device_train_batch_size: 2
|
236 |
+
per_device_eval_batch_size: 4
|
237 |
+
num_train_epochs: 1
|
238 |
+
max_length: 2048
|
239 |
+
max_prompt_length: 1800
|
240 |
+
warmup_ratio: 0.1
|
241 |
+
logging_steps: 1
|
242 |
+
lr_scheduler_type: cosine
|
243 |
+
optim: adamw_torch
|
244 |
+
|
245 |
+
torch_dtype: null
|
246 |
+
# use_flash_attention_2: true
|
247 |
+
do_eval: true
|
248 |
+
evaluation_strategy: steps
|
249 |
+
eval_steps: 100
|
250 |
+
gradient_checkpointing: true
|
251 |
+
gradient_checkpointing_kwargs:
|
252 |
+
use_reentrant: False
|
253 |
+
log_level: info
|
254 |
+
push_to_hub: true
|
255 |
+
save_total_limit: 0
|
256 |
+
seed: 42
|
257 |
+
report_to:
|
258 |
+
- wandb
|
259 |
+
```
|
260 |
+
</details><be>
|
261 |
+
|
262 |
+
## π Citation
|
263 |
+
|
264 |
+
If you find the model, data, or code useful, please cite:
|
265 |
+
```
|
266 |
+
@article{xu2024magpie,
|
267 |
+
title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
|
268 |
+
author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
|
269 |
+
year={2024},
|
270 |
+
eprint={2406.08464},
|
271 |
+
archivePrefix={arXiv},
|
272 |
+
primaryClass={cs.CL}
|
273 |
+
}
|
274 |
+
```
|
275 |
+
|
276 |
+
**Contact**
|
277 |
+
|
278 |
+
Questions? Contact:
|
279 |
+
- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
|
280 |
+
- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
|
281 |
+
|