⏳
Loading cheatsheet...
Diffusion models, LoRA fine-tuning, ControlNet, inpainting, image generation pipelines, and ComfyUI workflows.
Hugging Face Diffusers is the go-to library for diffusion models. It provides pre-trained models for image generation, inpainting, super-resolution, and more.
from diffusers import StableDiffusionPipeline, StableDiffusionXLPipeline
import torch
# ── Basic Text-to-Image ──
pipe = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
variant="fp16",
).to("cuda")
image = pipe(
"A serene Japanese garden with cherry blossoms",
num_inference_steps=30,
guidance_scale=7.5,
width=1024,
height=1024,
).images[0]
image.save("garden.png")
# ── Image-to-Image ──
from diffusers import StableDiffusionImg2ImgPipeline
i2i_pipe = StableDiffusionImg2ImgPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-refiner-1.0",
torch_dtype=torch.float16,
).to("cuda")
result = i2i_pipe(
prompt="Oil painting style",
image=input_image,
strength=0.7, # 0.0 = keep original, 1.0 = ignore original
num_inference_steps=30,
).images[0]
# ── Inpainting ──
from diffusers import StableDiffusionInpaintPipeline
inpaint_pipe = StableDiffusionInpaintPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-inpainting-1.0",
torch_dtype=torch.float16,
).to("cuda")
result = inpaint_pipe(
prompt="a red sports car",
image=base_image,
mask_image=mask, # White = inpaint, Black = keep
num_inference_steps=30,
).images[0]| Parameter | Default | Description | Effect |
|---|---|---|---|
| num_inference_steps | 50 | Number of denoising steps | More = better quality, slower (20-50 ideal) |
| guidance_scale | 7.5 | How closely to follow prompt | Higher = more prompt adherence, less creative |
| width / height | 512 | Output image dimensions | Must be divisible by 8. 1024x1024 for SDXL |
| negative_prompt | None | What to avoid | Reduce artifacts: "blurry, ugly, bad quality" |
| seed | random | Reproducibility | Same seed = same image (deterministic) |
| strength | 0.8 | Img2Img transformation amount | 0.0=original, 1.0=fully new |
| num_images_per_prompt | 1 | Images per generation | Batch multiple images |
Low-Rank Adaptation (LoRA) lets you train custom styles and concepts for Stable Diffusion on a single consumer GPU with as few as 10-20 images.
from diffusers import StableDiffusionPipeline, StableDiffusionXLPipeline
from peft import LoraConfig, get_peft_model
from diffusers import AutoencoderKL, DDPMScheduler, UNet2DConditionModel
# ── Train LoRA with SDXL (using diffusers + PEFT) ──
from diffusers import SDXLTransformer2DModel, StableDiffusionXLImg2ImgPipeline
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
vae = AutoencoderKL.from_pretrained(model_id, subfolder="vae")
scheduler = DDPMScheduler.from_pretrained(model_id, subfolder="scheduler")
# ── Use PEFT LoRA config ──
from transformers import CLIPTokenizer
lora_config = LoraConfig(
r=16,
lora_alpha=16,
init_lora_weights="gaussian",
target_modules=["to_q", "to_k", "to_v", "to_out.0"],
)
# For SDXL fine-tuning, use accelerate for multi-GPU
# accelerate launch train_lora_sdxl.py \
# --pretrained_model_name_or_path=$MODEL_DIR \
# --instance_data_dir=$INSTANCE_DIR \
# --output_dir=$OUTPUT_DIR \
# --instance_prompt="a photo of sks person" \
# --resolution=1024 \
# --train_batch_size=1 \
# --gradient_accumulation_steps=4 \
# --learning_rate=1e-4 \
# --lr_scheduler="constant" \
# --lr_warmup_steps=0 \
# --max_train_steps=500 \
# --mixed_precision="fp16"
# ── Inference with LoRA ──
pipe = StableDiffusionXLPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
torch_dtype=torch.float16,
).to("cuda")
pipe.load_lora_weights("./my-lora-weights", weight_name="pytorch_lora_weights.safetensors")
image = pipe("a photo of sks person as a cyborg", num_inference_steps=30,
guidance_scale=7.5).images[0]
# ── Combine multiple LoRAs ──
pipe.set_adapters(["style_lora", "character_lora"], adapter_weights=[0.7, 1.0])ControlNet adds spatial conditioning to Stable Diffusion, enabling precise control over composition, pose, depth, and edge structure of generated images.
from diffusers import StableDiffusionControlNetPipeline, ControlNetModel
from controlnet_aux import CannyDetector, OpenposeDetector, DepthEstimator
import torch
# ── Edge/Canny ControlNet ──
controlnet = ControlNetModel.from_pretrained(
"diffusers/controlnet-canny-sdxl-1.0",
torch_dtype=torch.float16,
)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=controlnet,
torch_dtype=torch.float16,
).to("cuda")
# Detect edges
canny = CannyDetector()
control_image = canny(input_image)
result = pipe(
"A futuristic city with neon lights",
image=control_image,
controlnet_conditioning_scale=0.8,
num_inference_steps=30,
).images[0]
# ── Multi-ControlNet (pose + depth) ──
from diffusers import StableDiffusionXLControlNetPipeline, MultiControlNetModel
controlnet_pose = ControlNetModel.from_pretrained(
"thibaud/controlnet-openpose-sdxl-1.0", torch_dtype=torch.float16)
controlnet_depth = ControlNetModel.from_pretrained(
"diffusers/controlnet-depth-sdxl-1.0", torch_dtype=torch.float16)
pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
"stabilityai/stable-diffusion-xl-base-1.0",
controlnet=MultiControlNetModel([controlnet_pose, controlnet_depth]),
torch_dtype=torch.float16,
).to("cuda")
result = pipe(
"A woman in a red dress walking in a garden",
controlnet_conditioning_scale=[0.9, 0.7],
image=[pose_image, depth_image],
num_inference_steps=30,
).images[0]| Type | Model | Use For | Input |
|---|---|---|---|
| Canny Edge | controlnet-canny-sdxl-1.0 | Edge-guided generation | Canny edge map |
| Depth | controlnet-depth-sdxl-1.0 | Depth-aware generation | Grayscale depth map |
| Pose (OpenPose) | controlnet-openpose-sdxl-1.0 | Pose-guided humans | Skeleton pose image |
| Scribble | controlnet-scribble-sdxl-1.0 | Draw your composition | Simple line scribble |
| Segmentation | controlnet-seg-sdxl-1.0 | Region-based generation | Semantic segmentation mask |
| Normal | controlnet-normal-sdxl-1.0 | Surface normal guidance | Normal map |
| Tile | controlnet-tile-sdxl-1.0 | Upscale with detail preservation | Tiled/resized image |
ComfyUI is a node-based interface for Stable Diffusion that enables complex workflows through visual programming. It is the most popular SD GUI for power users.
| Model | Type | Resolution | Best For | License |
|---|---|---|---|---|
| Stable Diffusion XL 1.0 | Latent Diffusion | 1024x1024 | General image generation | OpenRAIL-M |
| Stable Diffusion 3.5 | Flow Matching | 1MP+ | Prompt adherence, text rendering | Community |
| FLUX.1 (Black Forest) | Flow Matching | 1024x1024 | Prompt following, photorealism | Apache 2.0 |
| DALL-E 3 | Diffusion (API) | 1024x1024 | Best text rendering, easy API | Proprietary |
| Midjourney v6 | Proprietary | 1024x1024 | Artistic quality, aesthetics | Commercial license |
| PixArt-Sigma | Transformer | 1024x1024 | Open-source SDXL alternative | Apache 2.0 |
| Kandinsky 3.0 | Latent Diffusion | 1024x1024 | Russian/English bilingual | Apache 2.0 |
| Playground v2.5 | Latent Diffusion | 1024x1024 | Photorealism, composition | Custom license |