Magpie-Align
/

MagpieLM-8B-Chat-v0.1

@@ -1,52 +1,170 @@
 ---
 library_name: transformers
 license: llama3.1
-base_model: Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K
 tags:
 - alignment-handbook
 - trl
 - dpo
 - generated_from_trainer
-- trl
-- dpo
-- generated_from_trainer
 datasets:
-- Magpie-Align/MagpieLM-4B-DPO-Data-v0.1
 model-index:
-- name: Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
-# Llama-3.1-8B-Magpie-SFT-GMix-550K-DPO-02Mix
-This model is a fine-tuned version of [Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K](https://huggingface.co/Magpie-Align/Llama-3.1-8B-Magpie-SFT-GMix-550K) on the Magpie-Align/MagpieLM-4B-DPO-Data-v0.1 dataset.
-It achieves the following results on the evaluation set:
-- Loss: 0.3866
-- Rewards/chosen: -5.1623
-- Rewards/rejected: -6.8930
-- Rewards/accuracies: 0.8060
-- Rewards/margins: 1.7307
-- Logps/rejected: -1154.4679
-- Logps/chosen: -990.1328
-- Logits/rejected: -0.6102
-- Logits/chosen: -0.6705
-## Model description
-More information needed
-## Intended uses & limitations
-More information needed
-## Training and evaluation data
-More information needed
-## Training procedure
 ### Training hyperparameters
@@ -92,3 +210,72 @@ The following hyperparameters were used during training:
 - Pytorch 2.4.1+cu121
 - Datasets 3.0.0
 - Tokenizers 0.19.1

 ---
 library_name: transformers
 license: llama3.1
+base_model: Magpie-Align/MagpieLM-8B-SFT-v0.1
 tags:
 - alignment-handbook
 - trl
 - dpo
 - generated_from_trainer
 datasets:
+- Magpie-Align/MagpieLM-SFT-Data-v0.1
+- Magpie-Align/MagpieLM-DPO-Data-v0.1
 model-index:
+- name: MagpieLM-8B-Chat-v0.1
   results: []
 ---
+![Magpie](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/FWWILXrAGNwWr52aghV0S.png)
+# 🐦 MagpieLM-8B-Chat-v0.1
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="200" height="32"/>](https://api.wandb.ai/links/uw-nsl/0s1eegy2)
+## 🧐 About This Model
+*Model full name: Llama3.1-MagpieLM-8B-Chat-v0.1*
+This model is an aligned version of [meta-llama/Meta-Llama-3.1-8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B), which achieves state-of-the-art performance among open-aligned SLMs. It even outperforms larger open-weight models including Llama-3-8B-Instruct, Llama-3.1-8B-Instruct, Qwen-2-7B-Instruct, and Gemma-2-9B-it.
+We apply the following standard alignment pipeline with two carefully crafted synthetic datasets.
+We first perform SFT using [Magpie-Align/MagpieLM-SFT-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-SFT-Data-v0.1).
+* **SFT Model Checkpoint:** [Magpie-Align/MagpieLM-8B-SFT-v0.1](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1)
+We then perform DPO on the [Magpie-Align/MagpieLM-DPO-Data-v0.1](https://huggingface.co/datasets/Magpie-Align/MagpieLM-DPO-Data-v0.1) dataset.
+## 🔥 Benchmark Performance
+Greedy Decoding
+- **Alpaca Eval 2: 58.18 (LC), 62.38 (WR)**
+- **Arena Hard: 48.4**
+- **WildBench WB Score (v2.0625): 44.72**
+**Benchmark Performance Compare to Other SOTA SLMs**
+![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/653df1323479e9ebbe3eb6cc/q1Rasy66h6lmaUP1KQ407.jpeg)
+## 👀 Other Information
+**License**: Please follow [Meta Llama 3.1 Community License](https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/LICENSE).
+**Conversation Template**: Please use the Llama 3 chat template for the best performance.
+## 🧐 How to use it?
+Please update transformers to the latest version by `pip install git+https://github.com/huggingface/transformers`.
+You can then run conversational inference using the Transformers `pipeline` abstraction or by leveraging the Auto classes with the `generate()` function.
+```python
+import transformers
+import torch
+model_id = "MagpieLM-8B-Chat-v0.1"
+pipeline = transformers.pipeline(
+    "text-generation",
+    model=model_id,
+    model_kwargs={"torch_dtype": torch.bfloat16},
+    device_map="auto",
+)
+messages = [
+    {"role": "system", "content": "You are Magpie, a friendly AI assistant."},
+    {"role": "user", "content": "Who are you?"},
+]
+outputs = pipeline(
+    messages,
+    max_new_tokens=256,
+)
+print(outputs[0]["generated_text"][-1])
+```
+---
+# Alignment Pipeline
+The detailed alignment pipeline is as follows.
+## Stage 1: Supervised Fine-tuning
+We use [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl) for SFT. Please refer to the model card of [SFT checkpoint](https://huggingface.co/Magpie-Align/MagpieLM-8B-SFT-v0.1) and below for detailed configurations.
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.4.1`
+```yaml
+base_model: meta-llama/Meta-Llama-3.1-8B
+model_type: LlamaForCausalLM
+tokenizer_type: AutoTokenizer
+chat_template: llama3
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+main_process_port: 0
+datasets:
+  - path: Magpie-Align/MagpieLM-SFT-Data-v0.1
+    type: sharegpt
+    conversation: llama3
+dataset_prepared_path: last_run_prepared
+val_set_size: 0.001
+output_dir: axolotl_out/MagpieLM-8B-SFT-v0.1
+sequence_len: 8192
+sample_packing: true
+eval_sample_packing: false
+pad_to_sequence_len: true
+wandb_project: SynDa
+wandb_entity:
+wandb_watch:
+wandb_name: MagpieLM-8B-SFT-v0.1
+wandb_log_model:
+hub_model_id: Magpie-Align/MagpieLM-8B-SFT-v0.1
+gradient_accumulation_steps: 32
+micro_batch_size: 1
+num_epochs: 2
+optimizer: paged_adamw_8bit
+lr_scheduler: cosine
+learning_rate: 2e-5
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: false
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+logging_steps: 1
+xformers_attention:
+flash_attention: true
+warmup_ratio: 0.1
+evals_per_epoch: 5
+eval_table_size:
+saves_per_epoch:
+debug:
+deepspeed:
+weight_decay: 0.0
+fsdp:
+fsdp_config:
+special_tokens:
+  pad_token: <|end_of_text|>
+```
+</details><br>
+## Stage 2: Direct Preference Optimization
 ### Training hyperparameters
 - Pytorch 2.4.1+cu121
 - Datasets 3.0.0
 - Tokenizers 0.19.1
+<details><summary>See alignment handbook configs</summary>
+```yaml
+# Customized Configs
+model_name_or_path: Magpie-Align/MagpieLM-8B-SFT-v0.1
+hub_model_id: Magpie-Align/MagpieLM-8B-Chat-v0.1
+output_dir: alignment_handbook_out/MagpieLM-8B-Chat-v0.1
+run_name: MagpieLM-8B-Chat-v0.1
+dataset_mixer:
+   Magpie-Align/MagpieLM-DPO-Data-v0.1: 1.0
+dataset_splits:
+- train
+- test
+preprocessing_num_workers: 24
+# DPOTrainer arguments
+bf16: true
+beta: 0.01
+learning_rate: 2.0e-7
+gradient_accumulation_steps: 16
+per_device_train_batch_size: 2
+per_device_eval_batch_size: 4
+num_train_epochs: 1
+max_length: 2048
+max_prompt_length: 1800
+warmup_ratio: 0.1
+logging_steps: 1
+lr_scheduler_type: cosine
+optim: adamw_torch
+torch_dtype: null
+# use_flash_attention_2: true
+do_eval: true
+evaluation_strategy: steps
+eval_steps: 100
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: False
+log_level: info
+push_to_hub: true
+save_total_limit: 0
+seed: 42
+report_to:
+- wandb
+```
+</details><be>
+## 📚 Citation
+If you find the model, data, or code useful, please cite:
+```
+@article{xu2024magpie,
+	title={Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing},
+	author={Zhangchen Xu and Fengqing Jiang and Luyao Niu and Yuntian Deng and Radha Poovendran and Yejin Choi and Bill Yuchen Lin},
+	year={2024},
+	eprint={2406.08464},
+	archivePrefix={arXiv},
+	primaryClass={cs.CL}
+}
+```
+**Contact**
+Questions? Contact:
+- [Zhangchen Xu](https://zhangchenxu.com/) [zxu9 at uw dot edu], and
+- [Bill Yuchen Lin](https://yuchenlin.xyz/) [yuchenlin1995 at gmail dot com]