--- license: apache-2.0 language: - en datasets: - ILSVRC/imagenet-1k --- # Model Card for Model ID VIT-MAE-r is a fine-tuned version of MAE for image reconstuction. We release a version fine-tuned from [MAE-Large](https://huggingface.co/facebook/vit-mae-large) ## Model Details VIT-MAE-r is already converted to hf format and should be able to be used directly by `from_pretrained` method. ### Model Sources - **Repository:** [LM4LV](https://github.com/bytetriper/LM4LV) - **Paper:** [LM4LV: A Frozen Large Language Model for Low-level Vision Tasks](https://arxiv.org/abs/2405.15734v1) - **source model**: [MAE-Large](https://huggingface.co/facebook/vit-mae-large) ## How to Get Started with the Model Use the code below to get started with the model. ```python from transformers import AutoImageProcessor, AutoModelForPreTraining model = AutoModelForPreTraining.from_pretrained("bytetriper/vit-mae-r") ``` ## Evaluation This model achieves a rFID on ImageNet val set of 1.24, evaluated using the standard tensorflow tool provided by [Guided-Diffusion](https://github.com/openai/guided-diffusion/tree/main/evaluations) ## Citation **BibTeX:** @article{zheng2024lm4lv, title={LM4LV: A Frozen Large Language Model for Low-level Vision Tasks}, author={Zheng, Boyang and Gu, Jinjin and Li, Shijun and Dong, Chao}, journal={arXiv preprint arXiv:2405.15734}, year={2024} } ## Model Card Authors Boyang Zheng ## Model Card Contact bytetriper@gmail.com