Edit model card

This autoregressive model belongs to a series of rather small language models trained on the BabyLM data:

  • the baby_llama model has few parameters and was trained on a small data set (10M tokens)
  • the teenie_llama model has the same number of parameters but was trained on more tokens of text (100M)
  • the weenie_llama model was trained on the small data set, but has more parameters/weights
  • the tweenie_llama model features both -- more tokens (the larger data set) and more weights (viz. parameters)
baby_llama teenie_llama weenie_llama tweenie_llama
Parameters 2.97M 2.97M 11.44M 11.44M
hidden layers 8 8 16 16
Attention heads 8 8 16 16
Embedding size 128 128 256 256
Context size 128 128 256 256
Vocab size 16k 16k 16k 16k
Downloads last month
5
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train bbunzeck/tweenie_llama

Collection including bbunzeck/tweenie_llama