bbunzeck/tweenie_llama · Hugging Face

This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

the baby_llama model has few parameters and was trained on a small data set (10M tokens)
the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
the weenie_llama model was trained on the small data set, but has more parameters/weights
the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)

	baby_llama	teenie_llama	weenie_llama	tweenie_llama
Parameters	2.97M	2.97M	11.44M	11.44M
hidden layers	8	8	16	16
Attention heads	8	8	16	16
Embedding size	128	128	256	256
Context size	128	128	256	256
Vocab size	16k	16k	16k	16k

bbunzeck
/

tweenie_llama