the original example code included two BOS tokens during inference.

modified it so that only one BOS token is included.

I only see one token when I run that code. I'm not sure why you are seeing two BOS tokens.

In the latest version of the transformer, the BOS token is added at the beginning in this part

prompt = processor.tokenizer.apply_chat_template(
    [{'role': 'user', 'content': "<image>\nWhat's the content of the image?"}],
    tokenize=False,
    add_generation_prompt=True
)

and it is also added at the beginning in this part.

inputs = processor(text=prompt, images=image, return_tensors="pt")

Therefore, you can see that the final input_ids contain two BOS tokens.

print(inputs.input_ids)

>> tensor([[     2,      2,    106,   1645,    108, 256000,    108,   1841, 235303,
         235256,    573,   3381,    576,    573,   2416, 235336,    107,    108,
            106,   2516,    108]])
Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment