Stable Diffusion

What is Stable Diffusion?

Stable Diffusion is the foundation of open-source AI image generation. First released by Stability AI in 2022, it runs on your own hardware with no cloud service, no monthly fee, and no usage limits. The model weights are freely available, and an enormous community has built on top of them: thousands of community-trained models, style adaptors called LoRAs, inpainting tools, upscalers, and powerful visual interfaces like ComfyUI and AUTOMATIC1111.

It is the most customisable image AI in existence. It is also the most complex to set up.

Which version to use

The Stable Diffusion ecosystem can be confusing, with different base models and different UIs. Here's the practical breakdown:

Model	Best for	Notes
SD 1.5	Widest community support, most LoRAs	Older but most-customised
SDXL	Higher resolution, better prompt following	Needs 8+ GB VRAM
SD 3.x	Latest, best text handling	Newer, fewer community models
FLUX	Best image quality, good text	By Black Forest Labs (different architecture)

For most new users: start with SDXL for quality. If you want to build on a massive ecosystem of community styles, SD 1.5 has the most resources.

The magic moment

After the painful setup is behind you, download a community LoRA trained on a specific illustration style (say, Studio Ghibli watercolour, or a specific architectural photography aesthetic). Apply it to any prompt and watch every image shift to that style, consistently, without any prompting tricks. That's the moment it clicks: this is not a product with guardrails. This is a tool that bends to your vision. The ceiling is unlimited.

Setup options

Option 1: AUTOMATIC1111 (most features, most documented)

Install Python 3.10 and Git
Clone the repo: git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui
Download a model checkpoint from Civitai or Hugging Face, place it in models/Stable-diffusion/
Run webui.bat (Windows) or webui.sh (Mac/Linux)
Open your browser at http://localhost:7860

Option 2: ComfyUI (most powerful, node-based)

Download ComfyUI from github.com/comfyanonymous/ComfyUI
Add a model to models/checkpoints/
Run the launch script and open the browser interface
Build generation pipelines visually. Nodes connect into workflows you can share and remix

Option 3: Forge (AUTOMATIC1111 but faster)

A performance-optimised fork of AUTOMATIC1111
Same interface, noticeably faster on lower-VRAM GPUs
Good choice if you're on an older or mid-range GPU

Hardware requirements:

NVIDIA GPU with 4 GB+ VRAM: baseline, slow
8 GB VRAM: comfortable for SD 1.5 and SDXL
12 GB+ VRAM: smooth for SDXL and FLUX
Apple Silicon (M1/M2/M3): works well via MPS backend
CPU only: possible but very slow (10–30 minutes per image)

Budget 1–2 hours for your first setup. After that, adding models and LoRAs takes minutes.

Key concepts to understand

Checkpoints are the base model files (2–7 GB each). They determine the foundational style and capability.

LoRAs (Low-Rank Adaptations) are small style or character files (50–200 MB) that you layer on top of a base model. They bend the output toward a specific aesthetic, subject, or style without replacing the base model.

Samplers control how the image is generated. DDIM and DPM++ 2M Karras are reliable defaults. Experimenting with samplers can produce noticeably different results from the same prompt.

CFG Scale (Classifier-Free Guidance) controls how strictly the image follows your prompt. 7–9 is the standard range. Higher values produce more literal but sometimes strange results.

Steps: more steps means more detail but slower generation. 20 steps for quick previews, 30–50 for final outputs.

Where to find models and LoRAs

Civitai (civitai.com) is the central hub for community-made models, LoRAs, embeddings, and sample images. Every model has user previews, download counts, and ratings. Browse here to find styles you want before searching for prompts.

Hugging Face (huggingface.co) hosts official and research models in a more structured format. Good for base models and newer architectures.

Compare with similar tools

Tool	Ease of use	Quality	Cost	Best for
Stable Diffusion	Hard	Excellent (with tuning)	Free	Power users, customisation, volume
FLUX	Easy (web)	Excellent	Free (Schnell)	Photorealism, beginners wanting quality
Midjourney	Easy	Excellent	$10–$60/mo	Artistic, commercial work
Ideogram	Easy	Good	Free tier	Text in images, design

Pick Stable Diffusion when you want unlimited generation, deep customisation, or complete privacy. Pick FLUX if you want near-equivalent quality with a much easier start. Pick Midjourney if you want the most reliably beautiful results without any setup.