Text Generation
PEFT
Safetensors
llama-2
Eval Results
dfurman commited on
Commit
de8ab98
1 Parent(s): f8679a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -96,11 +96,13 @@ Example 3:
96
  >
97
  > Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
98
 
 
 
99
  ## Model Description
100
 
101
  The architecture is a modification of a standard decoder-only transformer.
102
 
103
- The llama-2 models have been modified from a standard transformer in the following ways:
104
  * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
105
  * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
106
  * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)
 
96
  >
97
  > Note: You can also add chocolate chips, dried fruit, or other mix-ins to the batter for extra flavor and texture. Enjoy your vegan banana bread!
98
 
99
+ <br>
100
+
101
  ## Model Description
102
 
103
  The architecture is a modification of a standard decoder-only transformer.
104
 
105
+ The llama-2-70b models have been modified from a standard transformer in the following ways:
106
  * It uses [grouped-query attention](https://arxiv.org/pdf/2305.13245.pdf) (GQA), a generalization of multi-query attention which uses an intermediate number of key-value heads.
107
  * It uses the [SwiGLU activation function](https://arxiv.org/abs/2002.05202)
108
  * It uses [rotary positional embeddings](https://arxiv.org/abs/2104.09864) (RoPE)