6.5. stable-diffusion.cpp Acceleration¶
stable-diffusion.cpp brings the power of local image generation to Backend.AI GO. Just as llama.cpp revolutionized local LLM inference, stable-diffusion.cpp enables efficient, high-quality image generation directly on your hardware.
What is stable-diffusion.cpp?¶
stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.
-
Repository: github.com/leejet/stable-diffusion.cpp
-
Creator: leejet (@leejet)
The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.
Why is it Special?¶
1. Cross-Platform Efficiency¶
Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.
2. Hardware Acceleration¶
The project supports multiple acceleration backends:
| Platform | Backend | Description |
|---|---|---|
| macOS | Metal | Optimized for Apple Silicon (M1-M5) with unified memory support |
| Windows | CUDA | High performance with NVIDIA GPUs |
| Windows | Vulkan | Cross-vendor GPU support (AMD, Intel, NVIDIA) |
| Linux | CUDA / ROCm | Support for NVIDIA and AMD GPUs |
| CPU | AVX2 / AVX-512 | Fallback for any machine |
3. Memory Efficient¶
The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.
Supported Model Types¶
Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:
| Model Type | Description | Typical Size |
|---|---|---|
| SD 1.x | Original Stable Diffusion (1.4, 1.5) | ~2-4 GB |
| SD 2.x | Stable Diffusion 2.0, 2.1 | ~4-5 GB |
| SDXL | Stable Diffusion XL | ~6-7 GB |
| SD3 | Stable Diffusion 3 / 3.5 (multi-file: CLIP-L + CLIP-G + T5-XXL) | ~10+ GB |
| Flux | Black Forest Labs Flux architecture (multi-file: VAE + CLIP-L + T5-XXL) | ~12+ GB |
| Chroma | Flux-derived architecture (multi-file: VAE + T5-XXL) | ~10+ GB |
| Qwen-Image | Alibaba's Qwen image generation (multi-file: VAE + Qwen2.5-VL encoder) | Varies |
| Z-Image | Z-Image / Z-Image Turbo (multi-file: VAE + Qwen3-4B encoder) | ~5+ GB |
Role in Backend.AI GO¶
Integration¶
Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:
-
The app spawns
sd-serveras a dedicated background process. -
It provides an OpenAI-compatible
/v1/images/generationsAPI endpoint. -
Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).
-
The process is fully managed with health checks and automatic cleanup.
Port Management¶
The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.
Key Settings¶
When configuring image generation in Backend.AI GO:
-
GPU Layers: Controls how many layers are offloaded to GPU. Use
-1to offload everything for maximum speed. -
Threads: Number of CPU threads to use. Auto-detected if not specified.
-
CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.
-
Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.
-
Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.).
euler_ais a good default.
Supported Formats¶
Backend.AI GO accepts diffusion models in the following formats:
-
.safetensors: The recommended format. Safe, fast to load, and widely supported. -
.ckpt: Legacy checkpoint format. Works but safetensors is preferred. -
.gguf: Quantized format for reduced memory usage.
Model Components¶
Multi-file models (SD3, Flux, Chroma, Qwen-Image, Z-Image) load a main diffusion model plus one or more companion files:
-
VAE: Variational Auto-Encoder for image encoding/decoding (Flux, Chroma, Qwen-Image, Z-Image).
-
CLIP-L: CLIP text encoder (SDXL, SD3, Flux).
-
CLIP-G: Second CLIP text encoder (SDXL, SD3).
-
T5-XXL: T5 text encoder (SD3, Flux, Chroma).
-
LLM: An LLM text encoder — Qwen2.5-VL for Qwen-Image, Qwen3-4B for Z-Image.
In the model's Components tab, Backend.AI GO shows the companion files each multi-file model requires, downloads any that are missing with one click, and validates that all required files are present before the model can load.
Shared companion store¶
Companion files are kept in a single shared store and are downloaded once, then reused across every model that needs them. For example, the VAE, CLIP-L, and T5-XXL encoders are shared across Flux, Chroma, and SD3 — once you download t5xxl_fp16.safetensors for one model, every other model that needs it reuses the same file instead of re-downloading. You can also point a model at a manually-placed companion file sitting next to it on disk.
Backend.AI GO enables local image generation through the innovative work of leejet and the stable-diffusion.cpp community.