ViT-GPT2-FlowerCaptioner

This model is a fine-tuned version of nlpconnect/vit-gpt2-image-captioning on the FlowerEvolver-dataset dataset. It achieves the following results on the evaluation set:

Loss: 0.3075
Rouge1: 66.3702
Rouge2: 45.5642
Rougel: 61.401
Rougelsum: 64.0587
Gen Len: 49.97

sample running code

with python

from transformers import pipeline

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
FlowerCaptioner = pipeline("image-to-text", model="cristianglezm/ViT-GPT2-FlowerCaptioner", device=device)
FlowerCaptioner(["flower1.png"])
# A flower with 12 petals in a smooth gradient of green and blue. 
# The center is green with black accents. The stem is long and green.

with javascript

import { pipeline } from '@xenova/transformers';

// Allocate a pipeline for image-to-text
let pipe = await pipeline('image-to-text', 'cristianglezm/ViT-GPT2-FlowerCaptioner-ONNX');

let out = await pipe('flower image url');
// A flower with 12 petals in a smooth gradient of green and blue. 
// The center is green with black accents. The stem is long and green.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
0.6755	1.0	100	0.5339	60.9402	39.3331	54.6889	59.45	36.75
0.3666	2.0	200	0.3331	65.5149	43.0245	59.3121	62.7329	52.82
0.2983	3.0	300	0.3075	66.3702	45.5642	61.401	64.0587	49.97

Framework versions

Transformers 4.33.2
Pytorch 2.4.1+cu124
Datasets 2.20.0
Tokenizers 0.13.3

cristianglezm
/

ViT-GPT2-FlowerCaptioner