The model often enters infinite generation loops

#32
by sszymczyk - opened

For example for this code:

from transformers import pipeline
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": """Given the family relationships:
* Carol is Emily's parent.
* Emily is Henry's parent.
* Abigail is Gary's parent.
* Gary is Sean's parent.
* Emily is Abigail's parent.
What is Carol's relationship to Abigail?
Select the correct answer:
1. Carol is Abigail's grandchild.
2. Carol is Abigail's sibling.
3. Carol is Abigail's grandparent.
Enclose the selected answer number in the <ANSWER> tag, for example: <ANSWER>1</ANSWER>."""},
]
outputs = pipe(
    messages,
    max_new_tokens=4096,
    do_sample=False,
)
assistant_response = outputs[0]["generated_text"][-1]["content"]
print(assistant_response)

I got the following output:

$ python3 transformers-llama-3.1-2.py 
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 4/4 [00:00<00:00, 27.01it/s]
/home/phm/.local/opt/miniconda3/envs/llama.cpp/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/phm/.local/opt/miniconda3/envs/llama.cpp/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Let's break down the relationships:

* Carol is Emily's parent.
* Emily is Henry's parent.
* Abigail is Gary's parent.
* Gary is Sean's parent.
* Emily is Abigail's parent.

This creates a loop where Emily is both Carol's child and Abigail's child. This is a contradiction, but let's try to find a relationship between Carol and Abigail.

Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:
...

This is not an isolated incident, when I ran my farel-bench benchmark on the 8B model (450 prompts) it entered infinite generation loop a dozen of times. It happens in transformers library, llama.cpp and OpenRouter API.

I'm experiencing the same issue, and likewise have tested multiple providers and found it happens in all of them, and it affected the 70B model as well.

I tested llama3.1-8b on lmsys arena:

######################################################################################

Given the family relationships:

  • Carol is Emily's parent.
  • Emily is Henry's parent.
  • Abigail is Gary's parent.
  • Gary is Sean's parent.
  • Emily is Abigail's parent.
    What is Carol's relationship to Abigail?
    Select the correct answer:
  1. Carol is Abigail's grandchild.
  2. Carol is Abigail's sibling.
  3. Carol is Abigail's grandparent.
    Enclose the selected answer number in the <ANSWER> tag, for example: <ANSWER>1</ANSWER>.

######################################################################################

<ANSWER>3</ANSWER>

Explanation:

  • Carol is Emily's parent (given).
  • Emily is Abigail's parent (given).
  • Therefore, Carol is Abigail's grandparent.

For example for this code:

from transformers import pipeline
import torch

model_id = "meta-llama/Meta-Llama-3.1-8B-Instruct"
pipe = pipeline(
    "text-generation",
    model=model_id,
    model_kwargs={"torch_dtype": torch.bfloat16},
    device="cuda",
)

messages = [
    {"role": "user", "content": """Given the family relationships:
* Carol is Emily's parent.
* Emily is Henry's parent.
* Abigail is Gary's parent.
* Gary is Sean's parent.
* Emily is Abigail's parent.
What is Carol's relationship to Abigail?
Select the correct answer:
1. Carol is Abigail's grandchild.
2. Carol is Abigail's sibling.
3. Carol is Abigail's grandparent.
Enclose the selected answer number in the <ANSWER> tag, for example: <ANSWER>1</ANSWER>."""},
]
outputs = pipe(
    messages,
    max_new_tokens=4096,
    do_sample=False,
)
assistant_response = outputs[0]["generated_text"][-1]["content"]
print(assistant_response)

I got the following output:

$ python3 transformers-llama-3.1-2.py 
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████| 4/4 [00:00<00:00, 27.01it/s]
/home/phm/.local/opt/miniconda3/envs/llama.cpp/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:567: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/home/phm/.local/opt/miniconda3/envs/llama.cpp/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:572: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Let's break down the relationships:

* Carol is Emily's parent.
* Emily is Henry's parent.
* Abigail is Gary's parent.
* Gary is Sean's parent.
* Emily is Abigail's parent.

This creates a loop where Emily is both Carol's child and Abigail's child. This is a contradiction, but let's try to find a relationship between Carol and Abigail.

Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:

* Option 1: Carol is Abigail's grandchild. This is not possible, as Carol is Emily's parent, and Emily is Abigail's child.
* Option 3: Carol is Abigail's grandparent. This is also not possible, as Carol is Emily's parent, and Emily is Abigail's child.

However, if we re-examine the relationships, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Wait, let's re-examine the relationships again. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings, and Carol is not Abigail's grandchild or grandparent.

However, if we look at the relationships again, we can see that Carol is Emily's parent, and Emily is Abigail's child. This makes Carol Abigail's parent-in-law, but not a grandparent. However, this is not an option.

Let's try to find a relationship between Carol and Abigail. Since Emily is both Carol's child and Abigail's child, and Emily is also Henry's parent, we can conclude that Carol and Abigail are not siblings (option 2 is incorrect).

Now, let's look at the options:
...

This is not an isolated incident, when I ran my farel-bench benchmark on the 8B model (450 prompts) it entered infinite generation loop a dozen of times. It happens in transformers library, llama.cpp and OpenRouter API.

Actually, you can try replacing <ANSWER> with \<ANSWER\>, at least this is how I got the correct output.

@CHNtentes Yeah I know that I can change the prompt a little or perhaps use some sampling settings, but that's not the point - the problem is that the model deterministically generates looped token sequences, which indicates that there's something wrong with the model.
@Mikael110 I checked logs more carefully and indeed the same problem exists in 70b model too, but it does not occur as often as with the 8B model. I did not encounter the problem so far in 405b model.

It seems i have the same problem. Testing only the 8B variant though.
The model generates tokens usually until max_new_tokens is reached. Often reapeating original answer.
Anybody got a fix?

After playing around a bit with the models I've noticed that asking it to make guesses based on slightly incorrect / contradictory information often triggers the infinite loop. And it actually affects all of the sizes, 8B, 70B and even 405B. Which suggests to me the issue is likely to be an implementation issue of some kind.

A short example is this prompt:

Musical Quiz: Guess the title of songs based on clues:

1. This song's title references a celestial event and was a major hit for an Australian singer in the 80s.

It hints at the song "Total Eclipse of the Heart" but that is contradicted by the nationality of the singer. In my testing this almost always breaks the 70B and 405B models and usually breaks the 8B model, based on the sampler settings.

An output from the 405B model can be seen: here. That was with no repetition penalty, with repetition penalty it becomes far more broken.

@sszymczyk do you have transformers installed from pypi package? If so can you share which version? Alternatively, if you are using GitHub source, can you confirm if this commit is in your repo? https://github.com/huggingface/transformers/commit/d5a99dfcee6e94065cb7c83cc8ab6fc5daa0cc4e

@edinan I have transformers 4.43.1 installed.

@edinan I noticed that a new version is available, so I updated transformers to 4.43.2, but I still have the looped generation problem with this version.

@Mikael110 For me it almost looks like 3.1 Llama models were trained to "try again" after failing to answer, I guess that can sometimes cause an infinite thought loop.

Thanks for the info @sszymczyk ! We're still looking into root causing the issue. For now, if you avoid using greedy decoding, it may help:
(
i.e. replace do_sample=False with something like

do_sample=True,
temperature=0.6,
top_p=0.9,

)

Have same issue when using Llama3.1-8B-Instruct to do some multilingual translation tasks, waiting for useful solution.

截屏2024-07-25 14.42.38.png

Have same issue when using Llama3.1-8B-Instruct to do some multilingual translation tasks, waiting for useful solution.

It looks like you are using the Alpaca prompt template (### Instruction, ### Input, ### Response) which is not correct for Llama 3.1 models and will lead to degraded and unpredictable results.
Using the proper template might solve your particular issue. Docs for the proper prompt template can be found here.

I'd also recommend looking into some interface that has built in support for prompt templates, so that you don't have to manage it yourself.

Sign up or log in to comment