4.5. stable-diffusion.cpp Acceleration¶

stable-diffusion.cpp brings the power of local image generation to Backend.AI GO. Just as llama.cpp revolutionized local LLM inference, stable-diffusion.cpp enables efficient, high-quality image generation directly on your hardware.

What is stable-diffusion.cpp?¶

stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.

Repository: github.com/leejet/stable-diffusion.cpp
Creator: leejet (@leejet)

The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.

Why is it Special?¶

1. Cross-Platform Efficiency¶

Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.

2. Hardware Acceleration¶

The project supports multiple acceleration backends:

Platform	Backend	Description
macOS	Metal	Optimized for Apple Silicon (M1-M5) with unified memory support
Windows	CUDA	High performance with NVIDIA GPUs
Windows	Vulkan	Cross-vendor GPU support (AMD, Intel, NVIDIA)
Linux	CUDA / ROCm	Support for NVIDIA and AMD GPUs
CPU	AVX2 / AVX-512	Fallback for any machine

3. Memory Efficient¶

The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.

Supported Model Types¶

Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:

Model Type	Description	Typical Size
SD 1.x	Original Stable Diffusion (1.4, 1.5)	~2-4 GB
SD 2.x	Stable Diffusion 2.0, 2.1	~4-5 GB
SDXL	Stable Diffusion XL	~6-7 GB
Flux	Black Forest Labs Flux architecture	~12+ GB
Qwen-Image	Alibaba's Qwen image generation	Varies

Role in Backend.AI GO¶

Integration¶

Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:

The app spawns sd-server as a dedicated background process.
It provides an OpenAI-compatible /v1/images/generations API endpoint.
Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).
The process is fully managed with health checks and automatic cleanup.

Port Management¶

The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.

Key Settings¶

When configuring image generation in Backend.AI GO:

GPU Layers: Controls how many layers are offloaded to GPU. Use -1 to offload everything for maximum speed.
Threads: Number of CPU threads to use. Auto-detected if not specified.
CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.
Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.
Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.). euler_a is a good default.

Supported Formats¶

Backend.AI GO accepts diffusion models in the following formats:

.safetensors: The recommended format. Safe, fast to load, and widely supported.
.ckpt: Legacy checkpoint format. Works but safetensors is preferred.
.gguf: Quantized format for reduced memory usage.

Model Components¶

Some models (especially SDXL and Flux) may require additional components:

VAE: Variational Auto-Encoder for image encoding/decoding.
CLIP-L: Text encoder for SDXL models.
T5-XXL: Text encoder for Flux models.

Backend.AI GO allows you to specify these optional components when loading a model.

Backend.AI GO enables local image generation through the innovative work of leejet and the stable-diffusion.cpp community.