6.5. stable-diffusion.cpp Acceleration¶

stable-diffusion.cpp provides local image generation in Backend.AI GO. Like llama.cpp for LLM inference, it runs efficient, high-quality image generation directly on your hardware.

What is stable-diffusion.cpp?¶

stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.

Repository: github.com/leejet/stable-diffusion.cpp
Creator: leejet (@leejet)

The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.

Why is it Special?¶

1. Cross-Platform Efficiency¶

Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.

2. Hardware Acceleration¶

The project supports multiple acceleration backends:

Platform	Backend	Description
macOS	Metal	Optimized for Apple Silicon (M1-M5) with unified memory support
Windows	CUDA	High performance with NVIDIA GPUs
Windows	Vulkan	Cross-vendor GPU support (AMD, Intel, NVIDIA)
Linux	CUDA / ROCm	Support for NVIDIA and AMD GPUs
CPU	AVX2 / AVX-512	Fallback for any machine

3. Memory Efficient¶

The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.

Supported Model Types¶

Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:

Model Type	Description	Typical Size
SD 1.x	Original Stable Diffusion (1.4, 1.5)	~2-4 GB
SD 2.x	Stable Diffusion 2.0, 2.1	~4-5 GB
SDXL	Stable Diffusion XL	~6-7 GB
SD3	Stable Diffusion 3 / 3.5 (multi-file: CLIP-L + CLIP-G + T5-XXL)	~10+ GB
Flux	Black Forest Labs Flux architecture (multi-file: VAE + CLIP-L + T5-XXL)	~12+ GB
Chroma	Flux-derived architecture (multi-file: VAE + T5-XXL)	~10+ GB
Qwen-Image	Alibaba's Qwen image generation (multi-file: VAE + Qwen2.5-VL encoder)	Varies
Z-Image	Z-Image / Z-Image Turbo (multi-file: VAE + Qwen3-4B encoder)	~5+ GB

Role in Backend.AI GO¶

Integration¶

Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:

The app spawns sd-server as a dedicated background process.
It provides an OpenAI-compatible /v1/images/generations API endpoint.
Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).
The process is fully managed with health checks and automatic cleanup.

Port Management¶

The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.

Key Settings¶

When configuring image generation in Backend.AI GO:

GPU Layers: Controls how many layers are offloaded to GPU. Use -1 to offload everything for maximum speed.
Threads: Number of CPU threads to use. Auto-detected if not specified.
CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.
Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.
Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.). euler_a is a good default.

Supported Formats¶

Backend.AI GO accepts diffusion models in the following formats:

.safetensors: The recommended format. Safe, fast to load, and widely supported.
.ckpt: Legacy checkpoint format. Works but safetensors is preferred.
.gguf: Quantized format for reduced memory usage.

Model Components¶

Multi-file models (SD3, Flux, Chroma, Qwen-Image, Z-Image) load a main diffusion model plus one or more companion files:

VAE: Variational Auto-Encoder for image encoding/decoding (Flux, Chroma, Qwen-Image, Z-Image).
CLIP-L: CLIP text encoder (SDXL, SD3, Flux).
CLIP-G: Second CLIP text encoder (SDXL, SD3).
T5-XXL: T5 text encoder (SD3, Flux, Chroma).
LLM: An LLM text encoder — Qwen2.5-VL for Qwen-Image, Qwen3-4B for Z-Image.

In the model's Components tab, Backend.AI GO shows the companion files each multi-file model requires, downloads any that are missing with one click, and validates that all required files are present before the model can load.

Shared companion store¶

Companion files are kept in a single shared store and are downloaded once, then reused across every model that needs them. For example, the VAE, CLIP-L, and T5-XXL encoders are shared across Flux, Chroma, and SD3 — once you download t5xxl_fp16.safetensors for one model, every other model that needs it reuses the same file instead of re-downloading. You can also point a model at a manually-placed companion file sitting next to it on disk.

Backend.AI GO's local image generation builds on the work of leejet and the stable-diffusion.cpp community.