dfurman
/

Llama-2-70B-Instruct-v0.1

Text Generation

Model card Files Files and versions Community

dfurman commited on Jul 24, 2023

Commit

de8ab98

•

1 Parent(s): f8679a8

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -96,11 +96,13 @@ Example 3:
 >
 > Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
 ## Model Description
 The architecture is a modification of a standard decoder-only transformer.
-The llama-2 models have been modified from a standard transformer in the following ways:
 * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
 * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
 * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)

 >
 > Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
+<br>
 ## Model Description
 The architecture is a modification of a standard decoder-only transformer.
+The llama-2-70b models have been modified from a standard transformer in the following ways:
 * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
 * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
 * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)