What is Stable Diffusion?
Stable Diffusion is the foundation of open-source AI image generation. First released by Stability AI in 2022, it runs on your own hardware — no cloud service, no monthly fee, no usage limits. The model weights are freely available, and an enormous community has built on top of them: thousands of community-trained models, style adaptors called LoRAs, inpainting tools, upscalers, and powerful visual interfaces like ComfyUI and AUTOMATIC1111.
It is the most customisable image AI in existence. It is also the most complex to set up.
Which version to use
The Stable Diffusion ecosystem can be confusing — different base models, different UIs. Here's the practical breakdown:
| Model | Best for | Notes |
|---|---|---|
| SD 1.5 | Widest community support, most LoRAs | Older but most-customised |
| SDXL | Higher resolution, better prompt following | Needs 8+ GB VRAM |
| SD 3.x | Latest, best text handling | Newer, fewer community models |
| FLUX | Best image quality, good text | By Black Forest Labs — different architecture |
For most new users: start with SDXL for quality. If you want to build on a massive ecosystem of community styles, SD 1.5 has the most resources.
The magic moment
After the painful setup is behind you, download a community LoRA trained on a specific illustration style — say, Studio Ghibli watercolour, or a specific architectural photography aesthetic. Apply it to any prompt and watch every image shift to that style, consistently, without any prompting tricks. That's the moment it clicks: this is not a product with guardrails. This is a tool that bends to your vision. The ceiling is unlimited.
Setup options
Option 1: AUTOMATIC1111 (most features, most documented)
- Install Python 3.10 and Git
- Clone the repo:
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui - Download a model checkpoint from Civitai or Hugging Face, place it in
models/Stable-diffusion/ - Run
webui.bat(Windows) orwebui.sh(Mac/Linux) - Open your browser at
http://localhost:7860
Option 2: ComfyUI (most powerful, node-based)
- Download ComfyUI from github.com/comfyanonymous/ComfyUI
- Add a model to
models/checkpoints/ - Run the launch script and open the browser interface
- Build generation pipelines visually — nodes connect into workflows you can share and remix
Option 3: Forge (AUTOMATIC1111 but faster)
- A performance-optimised fork of AUTOMATIC1111
- Same interface, noticeably faster on lower-VRAM GPUs
- Good choice if you're on an older or mid-range GPU
Hardware requirements:
- NVIDIA GPU with 4 GB+ VRAM — baseline, slow
- 8 GB VRAM — comfortable for SD 1.5 and SDXL
- 12 GB+ VRAM — smooth for SDXL and FLUX
- Apple Silicon (M1/M2/M3) — works well via MPS backend
- CPU only — possible but very slow (10–30 minutes per image)
Budget 1–2 hours for your first setup. After that, adding models and LoRAs takes minutes.
Key concepts to understand
Checkpoints are the base model files (2–7 GB each). They determine the foundational style and capability.
LoRAs (Low-Rank Adaptations) are small style or character files (50–200 MB) that you layer on top of a base model. They bend the output toward a specific aesthetic, subject, or style without replacing the base model.
Samplers control how the image is generated. DDIM and DPM++ 2M Karras are reliable defaults. Experimenting with samplers can produce noticeably different results from the same prompt.
CFG Scale (Classifier-Free Guidance) controls how strictly the image follows your prompt. 7–9 is the standard range. Higher values produce more literal but sometimes strange results.
Steps — more steps means more detail but slower generation. 20 steps for quick previews, 30–50 for final outputs.
Where to find models and LoRAs
Civitai (civitai.com) is the central hub for community-made models, LoRAs, embeddings, and sample images. Every model has user previews, download counts, and ratings. Browse here to find styles you want before searching for prompts.
Hugging Face (huggingface.co) hosts official and research models in a more structured format. Good for base models and newer architectures.
Compare with similar tools
| Tool | Ease of use | Quality | Cost | Best for |
|---|---|---|---|---|
| Stable Diffusion | Hard | Excellent (with tuning) | Free | Power users, customisation, volume |
| FLUX | Easy (web) | Excellent | Free (Schnell) | Photorealism, beginners wanting quality |
| Midjourney | Easy | Excellent | $10–$60/mo | Artistic, commercial work |
| Ideogram | Easy | Good | Free tier | Text in images, design |
Pick Stable Diffusion when you want unlimited generation, deep customisation, or complete privacy. Pick FLUX if you want near-equivalent quality with a much easier start. Pick Midjourney if you want the most reliably beautiful results without any setup.