An error occurred: shape mismatch

#33

by VeeP - opened 8 days ago

Discussion

VeeP

8 days ago

Hello,

Would someone be able to help me with this error?

My code prompts for a local image on my system then runs it through the model. All the files are locally stored.

It seems the file is opened and about to be processed, then the error

My assumption is it will analyze the image and provide a text description.

The only issue I notice is, I do have a gpu, but it always uses the CPU. Could that be the cause?

2024-09-11 07:26:36,483 - INFO - Generating description for the image...
2024-09-11 07:26:36,508 - INFO - Image opened successfully. Original size: (512, 512)
2024-09-11 07:26:36,518 - INFO - Image resized to: (448, 448)
2024-09-11 07:26:36,521 - INFO - Model moved to cpu
2024-09-11 07:26:36,521 - INFO - Processing image with the processor...
2024-09-11 07:26:36,542 - INFO - Input tensor 'input_ids' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'attention_mask' shape: torch.Size([1, 82])
2024-09-11 07:26:36,542 - INFO - Input tensor 'pixel_values' shape: torch.Size([1024, 1176])
2024-09-11 07:26:36,542 - INFO - Input tensor 'image_grid_thw' shape: torch.Size([1, 3])
2024-09-11 07:26:36,543 - INFO - Generating output from the model...
Setting pad_token_id to eos_token_id:151645 for open-end generation.
2024-09-11 07:26:38,410 - ERROR - An error occurred: shape mismatch: value tensor of shape [256, 3584] cannot be broadcast to indexing result of shape [0, 3584]
None

sorry if this is a dupe msg

Thanks,

2U1

8 days ago

I think your text has no image token for it.
Check if your text has the token vision_start, image_pad, vision_end.

VeeP

8 days ago

•

edited 8 days ago

oh, I thought that was part of qwen-vl-utils. I am not sure where I check this but have this:

ry:
from qwen_vl_utils import process_vision_info

    image = Image.open(image_path).convert('RGB') 
    logging.info(f"Image opened successfully. Original size: {image.size}, Mode: {image.mode}")

    # Resize the image 
    image = image.resize((IMAGE_SIZE, IMAGE_SIZE))
    logging.info(f"Image resized to: {image.size}")

    # Device handling with explicit CUDA check
    if torch.cuda.is_available():
        device = torch.device("cuda")
        logging.info("CUDA-enabled GPU is available. Moving model and inputs to GPU.")
    else:
        device = torch.device("cpu")
        logging.info("No CUDA-enabled GPU found. Using CPU for processing.")

    model = model.to(device)

    conversation = [
        {
            "role": "user",
            "content": [
                {"type": "image", "image": image, "image_id": image_id} if image_id else {"type": "image", "image": image},
                {"type": "text", "text": "Describe this image."}
            ]
        }
    ]

2U1

8 days ago

The tokens should be added by processor.apply_chat_template

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment