4.5. stable-diffusion.cpp Acceleration¶
stable-diffusion.cpp brings the power of local image generation to Backend.AI GO. Just as llama.cpp revolutionized local LLM inference, stable-diffusion.cpp enables efficient, high-quality image generation directly on your hardware.
What is stable-diffusion.cpp?¶
stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.
-
Repository: github.com/leejet/stable-diffusion.cpp
-
Creator: leejet (@leejet)
The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.
Why is it Special?¶
1. Cross-Platform Efficiency¶
Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.
2. Hardware Acceleration¶
The project supports multiple acceleration backends:
| Platform | Backend | Description |
|---|---|---|
| macOS | Metal | Optimized for Apple Silicon (M1-M5) with unified memory support |
| Windows | CUDA | High performance with NVIDIA GPUs |
| Windows | Vulkan | Cross-vendor GPU support (AMD, Intel, NVIDIA) |
| Linux | CUDA / ROCm | Support for NVIDIA and AMD GPUs |
| CPU | AVX2 / AVX-512 | Fallback for any machine |
3. Memory Efficient¶
The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.
Supported Model Types¶
Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:
| Model Type | Description | Typical Size |
|---|---|---|
| SD 1.x | Original Stable Diffusion (1.4, 1.5) | ~2-4 GB |
| SD 2.x | Stable Diffusion 2.0, 2.1 | ~4-5 GB |
| SDXL | Stable Diffusion XL | ~6-7 GB |
| Flux | Black Forest Labs Flux architecture | ~12+ GB |
| Qwen-Image | Alibaba's Qwen image generation | Varies |
Role in Backend.AI GO¶
Integration¶
Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:
-
The app spawns
sd-serveras a dedicated background process. -
It provides an OpenAI-compatible
/v1/images/generationsAPI endpoint. -
Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).
-
The process is fully managed with health checks and automatic cleanup.
Port Management¶
The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.
Key Settings¶
When configuring image generation in Backend.AI GO:
-
GPU Layers: Controls how many layers are offloaded to GPU. Use
-1to offload everything for maximum speed. -
Threads: Number of CPU threads to use. Auto-detected if not specified.
-
CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.
-
Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.
-
Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.).
euler_ais a good default.
Supported Formats¶
Backend.AI GO accepts diffusion models in the following formats:
-
.safetensors: The recommended format. Safe, fast to load, and widely supported. -
.ckpt: Legacy checkpoint format. Works but safetensors is preferred. -
.gguf: Quantized format for reduced memory usage.
Model Components¶
Some models (especially SDXL and Flux) may require additional components:
-
VAE: Variational Auto-Encoder for image encoding/decoding.
-
CLIP-L: Text encoder for SDXL models.
-
T5-XXL: Text encoder for Flux models.
Backend.AI GO allows you to specify these optional components when loading a model.
Backend.AI GO enables local image generation through the innovative work of leejet and the stable-diffusion.cpp community.