flydust commited on
Commit
f75750c
β€’
1 Parent(s): 268dd21

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +214 -27
README.md CHANGED
@@ -1,52 +1,170 @@
1
  ---
2
  library_name: transformers
3
  license: llama3.1
4
- base_model: Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K
5
  tags:
6
  - alignment-handbook
7
  - trl
8
  - dpo
9
  - generated_from_trainer
10
- - trl
11
- - dpo
12
- - generated_from_trainer
13
  datasets:
14
- - Magpie-Align/MagpieLM-4B-DPO-Data-v0.1
 
15
  model-index:
16
- - name: Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix
17
  results: []
18
  ---
19
 
20
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
- should probably proofread and complete it, then remove this comment. -->
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
 
23
- # Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix
 
24
 
25
- This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K) on the Magpie-Align/MagpieLM-4B-DPO-Data-v0.1 dataset.
26
- It achieves the following results on the evaluation set:
27
- - Loss: 0.3866
28
- - Rewards/chosen: -5.1623
29
- - Rewards/rejected: -6.8930
30
- - Rewards/accuracies: 0.8060
31
- - Rewards/margins: 1.7307
32
- - Logps/rejected: -1154.4679
33
- - Logps/chosen: -990.1328
34
- - Logits/rejected: -0.6102
35
- - Logits/chosen: -0.6705
 
 
 
 
 
36
 
37
- ## Model description
 
 
38
 
39
- More information needed
 
 
 
40
 
41
- ## Intended uses & limitations
 
 
 
 
 
42
 
43
- More information needed
 
 
 
 
 
44
 
45
- ## Training and evaluation data
 
 
 
 
46
 
47
- More information needed
 
 
 
 
 
 
 
48
 
49
- ## Training procedure
 
 
 
 
 
 
 
 
 
 
 
 
 
 
50
 
51
  ### Training hyperparameters
52
 
@@ -92,3 +210,72 @@ The following hyperparameters were used during training:
92
  - Pytorch 2.4.1+cu121
93
  - Datasets 3.0.0
94
  - Tokenizers 0.19.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  library_name: transformers
3
  license: llama3.1
4
+ base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
5
  tags:
6
  - alignment-handbook
7
  - trl
8
  - dpo
9
  - generated_from_trainer
 
 
 
10
  datasets:
11
+ - Magpie-Align/MagpieLM-SFT-Data-v0.1
12
+ - Magpie-Align/MagpieLM-DPO-Data-v0.1
13
  model-index:
14
+ - name: MagpieLM-8B-Chat-v0.1
15
  results: []
16
  ---
17
 
18
+ ![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
19
+
20
+ # 🐦 MagpieLM-8B-Chat-v0.1
21
+
22
+ [<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
23
+
24
+ ## 🧐 About This Model
25
+
26
+ *Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
27
+
28
+ This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
29
+
30
+ We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
31
+
32
+ We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
33
+ * **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
34
+
35
+ We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
36
+
37
+ ## πŸ”₯ Benchmark Performance
38
+
39
+ Greedy Decoding
40
+
41
+ - **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
42
+ - **Arena Hard: 48.4**
43
+ - **WildBench WB Score (v2.0625): 44.72**
44
+
45
+ **Benchmark Performance Compare to Other SOTA SLMs**
46
+
47
+ ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
48
+
49
+ ## πŸ‘€ Other Information
50
+
51
+ **License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
52
+
53
+ **Conversation Template**: Please use the Llama 3 chat template for the best performance.
54
+
55
+ ## 🧐 How to use it?
56
+
57
+ Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
58
+
59
+ You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
60
+
61
+ ```python
62
+ import transformers
63
+ import torch
64
+
65
+ model_id = "MagpieLM-8B-Chat-v0.1"
66
+
67
+ pipeline = transformers.pipeline(
68
+ "text-generation",
69
+ model=model_id,
70
+ model_kwargs={"torch_dtype": torch.bfloat16},
71
+ device_map="auto",
72
+ )
73
+
74
+ messages = [
75
+ {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
76
+ {"role": "user", "content": "Who are you?"},
77
+ ]
78
+
79
+ outputs = pipeline(
80
+ messages,
81
+ max_new_tokens=256,
82
+ )
83
+ print(outputs[0]["generated_text"][-1])
84
+ ```
85
+
86
+ ---
87
+ # Alignment Pipeline
88
+
89
+ The detailed alignment pipeline is as follows.
90
+
91
+ ## Stage 1: Supervised Fine-tuning
92
+
93
+ We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
94
 
95
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
96
+ <details><summary>See axolotl config</summary>
97
 
98
+ axolotl version: `0.4.1`
99
+ ```yaml
100
+ base_model: meta-llama/Meta-Llama-3.1-8B
101
+ model_type: LlamaForCausalLM
102
+ tokenizer_type: AutoTokenizer
103
+ chat_template: llama3
104
+
105
+ load_in_8bit: false
106
+ load_in_4bit: false
107
+ strict: false
108
+ main_process_port: 0
109
+
110
+ datasets:
111
+ - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
112
+ type: sharegpt
113
+ conversation: llama3
114
 
115
+ dataset_prepared_path: last_run_prepared
116
+ val_set_size: 0.001
117
+ output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
118
 
119
+ sequence_len: 8192
120
+ sample_packing: true
121
+ eval_sample_packing: false
122
+ pad_to_sequence_len: true
123
 
124
+ wandb_project: SynDa
125
+ wandb_entity:
126
+ wandb_watch:
127
+ wandb_name: MagpieLM-8B-SFT-v0.1
128
+ wandb_log_model:
129
+ hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
130
 
131
+ gradient_accumulation_steps: 32
132
+ micro_batch_size: 1
133
+ num_epochs: 2
134
+ optimizer: paged_adamw_8bit
135
+ lr_scheduler: cosine
136
+ learning_rate: 2e-5
137
 
138
+ train_on_inputs: false
139
+ group_by_length: false
140
+ bf16: auto
141
+ fp16:
142
+ tf32: false
143
 
144
+ gradient_checkpointing: true
145
+ gradient_checkpointing_kwargs:
146
+ use_reentrant: false
147
+ early_stopping_patience:
148
+ resume_from_checkpoint:
149
+ logging_steps: 1
150
+ xformers_attention:
151
+ flash_attention: true
152
 
153
+ warmup_ratio: 0.1
154
+ evals_per_epoch: 5
155
+ eval_table_size:
156
+ saves_per_epoch:
157
+ debug:
158
+ deepspeed:
159
+ weight_decay: 0.0
160
+ fsdp:
161
+ fsdp_config:
162
+ special_tokens:
163
+ pad_token: <|end_of_text|>
164
+ ```
165
+ </details><br>
166
+
167
+ ## Stage 2: Direct Preference Optimization
168
 
169
  ### Training hyperparameters
170
 
 
210
  - Pytorch 2.4.1+cu121
211
  - Datasets 3.0.0
212
  - Tokenizers 0.19.1
213
+
214
+ <details><summary>See alignment handbook configs</summary>
215
+
216
+ ```yaml
217
+ # Customized Configs
218
+ model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
219
+ hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
220
+ output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
221
+ run_name: MagpieLM-8B-Chat-v0.1
222
+
223
+ dataset_mixer:
224
+ Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
225
+ dataset_splits:
226
+ - train
227
+ - test
228
+ preprocessing_num_workers: 24
229
+
230
+ # DPOTrainer arguments
231
+ bf16: true
232
+ beta: 0.01
233
+ learning_rate: 2.0e-7
234
+ gradient_accumulation_steps: 16
235
+ per_device_train_batch_size: 2
236
+ per_device_eval_batch_size: 4
237
+ num_train_epochs: 1
238
+ max_length: 2048
239
+ max_prompt_length: 1800
240
+ warmup_ratio: 0.1
241
+ logging_steps: 1
242
+ lr_scheduler_type: cosine
243
+ optim: adamw_torch
244
+
245
+ torch_dtype: null
246
+ # use_flash_attention_2: true
247
+ do_eval: true
248
+ evaluation_strategy: steps
249
+ eval_steps: 100
250
+ gradient_checkpointing: true
251
+ gradient_checkpointing_kwargs:
252
+ use_reentrant: False
253
+ log_level: info
254
+ push_to_hub: true
255
+ save_total_limit: 0
256
+ seed: 42
257
+ report_to:
258
+ - wandb
259
+ ```
260
+ </details><be>
261
+
262
+ ## πŸ“š Citation
263
+
264
+ If you find the model, data, or code useful, please cite:
265
+ ```
266
+ @article{xu2024magpie,
267
+ title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
268
+ author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
269
+ year={2024},
270
+ eprint={2406.08464},
271
+ archivePrefix={arXiv},
272
+ primaryClass={cs.CL}
273
+ }
274
+ ```
275
+
276
+ **Contact**
277
+
278
+ Questions? Contact:
279
+ - [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
280
+ - [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]
281
+