Skip to content

Latest commit

 

History

History
44 lines (28 loc) · 2.79 KB

performance.md

File metadata and controls

44 lines (28 loc) · 2.79 KB

Performance

Optimizations

Optimizations can be significantly helpful if you want to improve speed and reduce VRAM usage.

Attention

We will always apply scaled dot product attention from PyTorch.

FP8

FP8 requires torch >= 2.1.0. Add --unet-in-fp8-e4m3fn to command line arguments if you want fp8.

LCM

Latent Consistency Model is a recent breakthrough in Stable Diffusion community. You can generate images / videos within 6-8 steps if you

  • select LCM / Euler A / Euler / DDIM sampler
  • apply LCM LoRA
  • apply low CFG denoising strength (1-2 is recommended)

I have PR-ed this sampler to Stable Diffusion WebUI and you no longer need this extension to have LCM sampler. I have removed LCM sampler in this repository.

VRAM

These are for OG A1111. It is meaningless to measure VRAM consumption in Forge because @lllyasviel implemented batch VAE decode based on your available VRAM.

Actual VRAM usage depends on your image size and context batch size. You can try to reduce image size to reduce VRAM usage. You are discouraged from changing context batch size, because this conflicts training specification.

The following data are SD1.5 + AnimateDiff, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=W=512, frame=16 (default setting). w//w/o means Batch cond/uncond in Settings/Optimization is checked/unchecked.

Optimization VRAM w/ VRAM w/o
No optimization 12.13GB
xformers/sdp 5.60GB 4.21GB
sub-quadratic 10.39GB

For SDXL + HotShot + SDP, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=W=512, frame=8 (default setting), you need 8.66GB VRAM.

For SDXL + AnimateDiff + SDP, tested on Ubuntu 22.04, NVIDIA 4090, torch 2.0.1+cu117, H=1024, W=768, frame=16, you need 13.87GB VRAM.

Batch Size

Batch size on WebUI will be replaced by GIF frame number internally: 1 full GIF generated in 1 batch. If you want to generate multiple GIF at once, please change batch number.

Batch number is NOT the same as batch size. In A1111 WebUI, batch number is above batch size. Batch number means the number of sequential steps, but batch size means the number of parallel steps. You do not have to worry too much when you increase batch number, but you do need to worry about your VRAM when you increase your batch size (where in this extension, video frame number). You do not need to change batch size at all when you are using this extension.

We might develope approach to support batch size on WebUI, but this is with very low priority and we cannot commit a specific date for this.