File size: 9,454 Bytes
4cdf278
 
 
 
ef84292
4cdf278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b290d7
 
 
 
 
 
4cdf278
5b290d7
4cdf278
 
 
 
 
 
 
5b290d7
4cdf278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b290d7
 
4cdf278
 
 
 
 
 
 
 
 
 
5b290d7
 
4cdf278
 
 
 
 
 
5b290d7
 
4cdf278
 
 
 
 
 
 
5b290d7
4cdf278
 
 
 
 
 
 
 
 
 
 
 
5b290d7
4cdf278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5b290d7
4cdf278
 
 
 
 
 
5b290d7
 
4cdf278
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
---
license: openrail++
language:
- en
thumbnail: "https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/thumbnail.png"  
pipeline_tag: text-to-image
tags:
- stable-diffusion
- stable-diffusion-diffusers
inference: true
widget:
- text: >-
    masterpiece, best quality, 1girl, brown hair, green eyes, colorful, autumn,
    cumulonimbus clouds, lighting, blue sky, falling leaves, garden
  example_title: example 1girl
- text: >-
    masterpiece, best quality, 1boy, medium hair, blonde hair, blue eyes,
    bishounen, colorful, autumn, cumulonimbus clouds, lighting, blue sky,
    falling leaves, garden
  example_title: example 1boy
library_name: diffusers
---

<style>
  .title-container {
    display: flex;
    justify-content: center;
    align-items: center;
    height: 100vh; /* Adjust this value to position the title vertically */
  }
  .title {
    font-size: 3em;
    text-align: center;
    color: #333;
    font-family: Arial, sans-serif;
    text-transform: uppercase;
    letter-spacing: 0.05em;
    padding: 0.5em 0;
    box-shadow: 0px 0px 20px 0px rgba(0,0,0,0.15);
    background: transparent;
  }
  .title span {
    background: -webkit-linear-gradient(45deg, #fe6b8b 30%, #ff8e53 90%);
    -webkit-background-clip: text;
    -webkit-text-fill-color: transparent;
  }
  .image-grid {
    display: grid;
    grid-template-columns: repeat(3, 1fr);
    gap: 0.5em;
  }
  .image-item {
    box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15);
    padding: 10px;
  }
  .image-item img {
    width: 100%;
    height: 100%;
    object-fit: cover;
    border-radius: 10px;
    transition: transform .2s;
  }
  .image-item img:hover {
    transform: scale(1.1);
  }
  .custom-table {
    table-layout: fixed;
    width: 100%;
    border-collapse: collapse;
  }
  .custom-table td {
    width: 50%;
    vertical-align: top;
    padding: 10px;
    box-shadow: 0px 0px 10px 0px rgba(0,0,0,0.15);
  }
  .custom-image {
    width: 100%;
    height: 100%;
    object-fit: cover;
    border-radius: 10px;
    transition: transform .2s; 
  }
  .custom-image:hover {
    transform: scale(1.1);
  }
</style>

<h1 class="title"><span>Hermitage XL</span></h1>

<div class="image-grid">
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample1.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample1.png">
    </a>
  </div>
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample2.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample2.png">
    </a>
  </div>
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample3.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample3.png">
    </a>
  </div>
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample4.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample4.png">
    </a>
  </div>
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample5.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample5.png">
    </a>
  </div>
  <div class="image-item">
    <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/sample6.png">
      <img src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/sample6.png">
    </a>
  </div>
</div>

<hr>

## Overview

Hermitage XL is a high-resolution, latent text-to-image diffusion model. The model has been fine-tuned using a learning rate of 4e-7 over 5000 steps with a batch size of 16 on a curated dataset of superior-quality anime-style images. This model is derived from Stable Diffusion XL 1.0.

e.g. **_1girl, white hair, golden eyes, beautiful eyes, detail, flower meadow, cumulonimbus clouds, lighting, detailed sky, garden_** 

- Use it with the [`Stable Diffusion Webui`](https://github.com/AUTOMATIC1111/stable-diffusion-webui)
- Use it with 🧨 [`diffusers`](https://huggingface.co/docs/diffusers/index)
- Use it with the [`ComfyUI`](https://github.com/comfyanonymous/ComfyUI)

<hr>

## Features

1. High-Resolution Images: The model trained with 1024x1024 resolution. The model is trained using [NovelAI Aspect Ratio Bucketing Tool](https://github.com/NovelAI/novelai-aspect-ratio-bucketing) so that it can be trained at non-square resolutions.
2. Anime-styled Generation: Based on given text prompts, the model can create high quality anime-styled images.
3. Fine-Tuned Diffusion Process: The model utilizes a fine-tuned diffusion process to ensure high quality and unique image output.

<hr>

## Model Details

- **Developed by:** [Linaqruf](https://github.com/Linaqruf)
- **Model type:** Diffusion-based text-to-image generative model
- **Model Description:** This is a model that can be used to generate and modify anime-themed images based on text prompts. 
- **License:** [CreativeML Open RAIL++-M License](https://huggingface.co/stabilityai/stable-diffusion-2/blob/main/LICENSE-MODEL)
- **Finetuned from model:** [Stable Diffusion XL 1.0](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0)
<hr>

## How to Use:
- Download `Hermitage XL` [here](https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/hermitage-xl.safetensors), the model is in `.safetensors` format.
- You need to use Danbooru-style tag as prompt instead of natural language, otherwise you will get realistic result instead of anime
- You can use any generic negative prompt or use the following suggested negative prompt to guide the model towards high aesthetic generationse:
```
lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry
```
- And, the following should also be prepended to prompts to get high aesthetic results:
```
masterpiece, best quality, illustration, beautiful detailed, finely detailed, dramatic light, intricate details
```
<hr>

## 🧨 Diffusers 

Make sure to upgrade diffusers to >= 0.18.2:
```
pip install diffusers --upgrade
```

In addition make sure to install `transformers`, `safetensors`, `accelerate` as well as the invisible watermark:
```
pip install invisible_watermark transformers accelerate safetensors
```

Running the pipeline (if you don't swap the scheduler it will run with the default **EulerDiscreteScheduler** in this example we are swapping it to **EulerAncestralDiscreteScheduler**:
```py
import torch
from torch import autocast
from diffusers.models import AutoencoderKL
from diffusers import StableDiffusionXLPipeline, EulerAncestralDiscreteScheduler

model = "Linaqruf/hermitage-xl"
vae = AutoencoderKL.from_pretrained("stabilityai/sdxl-vae")

pipe = StableDiffusionXLPipeline.from_pretrained(
    model, 
    torch_dtype=torch.float16, 
    use_safetensors=True, 
    variant="fp16",
    vae=vae
    )

pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)
pipe.to('cuda')

prompt = "masterpiece, best quality, 1girl, green hair, sweater, looking at viewer, upper body, beanie, outdoors, watercolor, night, turtleneck"
negative_prompt = "lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

image = pipe(
    prompt, 
    negative_prompt=negative_prompt, 
    width=1024,
    height=1024,
    guidance_scale=12,
    target_size=(1024,1024),
    original_size=(4096,4096),
    num_inference_steps=50
    ).images[0]

image.save("anime_girl.png")
```
<hr>

## Limitation 
1. This model inherit Stable Diffusion XL 1.0 [limitation](https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0#limitations)
2. This model is overfitted and cannot follow prompts well, because it's fine-tuned for 5000 steps with small scale datasets.
3. It's only a preview model to find good hyperparameter and training config for Stable Diffusion XL 1.0

<hr>

## Example

Here is some cherrypicked samples and comparison between available models:

<table class="custom-table">
  <tr>
    <td>
      <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image1.png">
        <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image1.png" alt="sample1">
      </a>
      <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image3.png">
        <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image3.png" alt="sample3">
      </a>
    </td>
    <td>
      <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image2.png">
        <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image2.png" alt="sample2">
      </a>
      <a href="https://huggingface.co/Linaqruf/hermitage-xl/blob/main/sample_images/image4.png">
        <img class="custom-image" src="https://huggingface.co/Linaqruf/hermitage-xl/resolve/main/sample_images/image4.png" alt="sample4">
      </a>
    </td>
  </tr>
</table>