Edit model card

ConvLLaVA-JP Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here

Model details

Model type: ConvLLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. Supports the input of 768 x 768 high resolution images

Training dataset

Acknowledgement

License

Apache-2.0

Downloads last month
4
Safetensors
Model size
2.1B params
Tensor type
F32
·
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train toshi456/ConvLLaVA-JP-1.3b-768-Pretrain