After a bumpy start (see my other thread about it), I start to feel a bit comfortable with SDXL to the point that I probably wont look back at the 1.5 models. This wizard-hat wearing cat was generated in A1111 with:

"a cute kitty cat wearing a wizard hat, candy rays beaming out of the cat ears, (a swirling galaxy of candy pops background:0.7), 1980’s style digital art, hyperrealistic, paintbrush, shallow depth of field, bokeh, spotlight on face, cinematic lighting " Negative (from a standard style I use): "(bad anatomy:1.1), (high contrast:1.3), watermark, text, inscription, signature, canvas frame, (over saturated:1.2), (glossy:1.1), cartoon, 3d, ((disfigured)), ((bad art)), ((b&w)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck))), Photoshop, video game, ugly, tiling, poorly drawn hands, 3d render

Generated at 1024x1024 without refiner.

There’s a few things to be aware of, when working with SDXL in A1111 that I found:

  • make sure you upgraded A1111 to version 1.5.1 (do a “git pull” in the install directory)
  • I needed to add “–medvram” to my command line arguments, otherwise I’d get out of memory errors (12 GB VRAM)
  • make sure you have your VAE as “automatic” or using the SDXL VAE (can be downloaded from huggingface). Older VAE’s wont work
  • older LoRa don’t work and you will get errors
  • there is a noise offset LoRa for SDXL (sd_xl_offset_example-lora_1.0) which does work, but I don’t see too much difference in the images. With LoRa they are a tiny bit crisper. However, this LoRa doesn’t work with the Refiner model (you will get errors)

And the biggest one for me:

  • don’t use arbitrary image proportions and stick to the ones posted here: https://platform.stability.ai/docs/features/api-parameters This was the biggest mistake I made initially. By using other image sizes I’d get super wonky images and very unsatisfying results. I stick to the recommended dimensions and now my images are much, much better.

A word to the refiner model: as of now I don’t see big quality improvements if I go with the refiner model in img2img @about 0.1 - 0.25 denoising. I think I will play around more with this at higher denoising strength and see what I can get out of it.

Anyway, I think the SDXL is a huge improvement and I start getting really exciting results already Cheers :)