metadata

license: apache-2.0
datasets:
  - turing-motors/LLaVA-Pretrain-JA
language:
  - ja

ConvLLaVA-JP Model Card

This is a pretrained checkpoint, you can use it to instruct tune your multimodal models.

Check out the instructions here

Model details

Model type: ConvLLaVA-JP is a vision-language model that can converse about input images.
This model is an LVLM model trained using laion/CLIP-convnext_large_d_320.laion2B-s29B-b131K-ft as the image encoder and llm-jp/llm-jp-1.3b-v1.0 as the text decoder. Supports the input of 768 x 768 high resolution images

Training dataset

LLaVA-Pretrain-JA

Acknowledgement

ConvLLaVA
LLM-jp
Open CLIP

License

Apache-2.0