--- license: openrail++ language: - en thumbnail: "https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/thumbnail.png" pipeline_tag: text-to-image tags: - stable-diffusion - stable-diffusion-diffusers inference: true widget: - text: >- masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden example_title: example 1girl - text: >- masterpiece, best quality, 1boy, medium hair, blonde hair, blue eyes, bishounen, colorful, autumn, cumulonimbus clouds, lighting, blue sky, falling leaves, garden example_title: example 1boy library_name: diffusers ---

Hermitage XL

## Overview Hermitage XL is a high-resolution, latent text-to-image diffusion model. The model has been fine-tuned using a learning rate of 4e-7 over 5000 steps with a batch size of 16 on a curated dataset of superior-quality anime-style images. This model is derived from Stable Diffusion XL 1.0. e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_** - Use it with the [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui) - Use it with 🧨 [`diffusers`](https://huggingface.co/docs/diffusers/index) - Use it with the [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI)

## Features 1. High-Resolution Images: The model trained with 1024x1024 resolution. The model is trained using [NovelAI Aspect Ratio Bucketing Tool](https://github.com/NovelAI/novelai-aspect-ratio-bucketing) so that it can be trained at non-square resolutions. 2. Anime-styled Generation: Based on given text prompts, the model can create high quality anime-styled images. 3. Fine-Tuned Diffusion Process: The model utilizes a fine-tuned diffusion process to ensure high quality and unique image output.

## Model Details - **Developed by:** [Linaqruf](https://github.com/Linaqruf) - **Model type:** Diffusion-based text-to-image generative model - **Model Description:** This is a model that can be used to generate and modify anime-themed images based on text prompts. - **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL) - **Finetuned from model:** [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)

## How to Use: - Download `Hermitage XL` [here](https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/hermitage-xl.safetensors), the model is in `.safetensors` format. - You need to use Danbooru-style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime - You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse: ``` lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry ``` - And, the following should also be prepended to prompts to get high aesthetic results: ``` masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details ```

## 🧨 Diffusers Make sure to upgrade diffusers to >= 0.18.2: ``` pip install diffusers --upgrade ``` In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark: ``` pip install invisible_watermark transformers accelerate safetensors ``` Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**: ```py import torch from torch import autocast from diffusers.models import AutoencoderKL from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler model = "Linaqruf/hermitage-xl" vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae") pipe = StableDiffusionXLPipeline.from_pretrained( model, torch_dtype=torch.float16, use_safetensors=True, variant="fp16", vae=vae ) pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config) pipe.to('cuda') prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck" negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry" image = pipe( prompt, negative_prompt=negative_prompt, width=1024, height=1024, guidance_scale=12, target_size=(1024,1024), original_size=(4096,4096), num_inference_steps=50 ).images[0] image.save("anime_girl.png") ```

## Limitation 1. This model inherit Stable Diffusion XL 1.0 [limitation](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#limitations) 2. This model is overfitted and cannot follow prompts well, because it's fine-tuned for 5000 steps with small scale datasets. 3. It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0

## Example Here is some cherrypicked samples and comparison between available models: