Skip to content

6.5. stable-diffusion.cpp Acceleration

stable-diffusion.cpp brings the power of local image generation to Backend.AI GO. Just as llama.cpp revolutionized local LLM inference, stable-diffusion.cpp enables efficient, high-quality image generation directly on your hardware.

What is stable-diffusion.cpp?

stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.

The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.

Why is it Special?

1. Cross-Platform Efficiency

Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.

2. Hardware Acceleration

The project supports multiple acceleration backends:

Platform Backend Description
macOS Metal Optimized for Apple Silicon (M1-M5) with unified memory support
Windows CUDA High performance with NVIDIA GPUs
Windows Vulkan Cross-vendor GPU support (AMD, Intel, NVIDIA)
Linux CUDA / ROCm Support for NVIDIA and AMD GPUs
CPU AVX2 / AVX-512 Fallback for any machine

3. Memory Efficient

The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.

Supported Model Types

Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:

Model Type Description Typical Size
SD 1.x Original Stable Diffusion (1.4, 1.5) ~2-4 GB
SD 2.x Stable Diffusion 2.0, 2.1 ~4-5 GB
SDXL Stable Diffusion XL ~6-7 GB
SD3 Stable Diffusion 3 / 3.5 (multi-file: CLIP-L + CLIP-G + T5-XXL) ~10+ GB
Flux Black Forest Labs Flux architecture (multi-file: VAE + CLIP-L + T5-XXL) ~12+ GB
Chroma Flux-derived architecture (multi-file: VAE + T5-XXL) ~10+ GB
Qwen-Image Alibaba's Qwen image generation (multi-file: VAE + Qwen2.5-VL encoder) Varies
Z-Image Z-Image / Z-Image Turbo (multi-file: VAE + Qwen3-4B encoder) ~5+ GB

Role in Backend.AI GO

Integration

Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:

  1. The app spawns sd-server as a dedicated background process.

  2. It provides an OpenAI-compatible /v1/images/generations API endpoint.

  3. Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).

  4. The process is fully managed with health checks and automatic cleanup.

Port Management

The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.

Key Settings

When configuring image generation in Backend.AI GO:

  • GPU Layers: Controls how many layers are offloaded to GPU. Use -1 to offload everything for maximum speed.

  • Threads: Number of CPU threads to use. Auto-detected if not specified.

  • CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.

  • Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.

  • Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.). euler_a is a good default.

Supported Formats

Backend.AI GO accepts diffusion models in the following formats:

  • .safetensors: The recommended format. Safe, fast to load, and widely supported.

  • .ckpt: Legacy checkpoint format. Works but safetensors is preferred.

  • .gguf: Quantized format for reduced memory usage.

Model Components

Multi-file models (SD3, Flux, Chroma, Qwen-Image, Z-Image) load a main diffusion model plus one or more companion files:

  • VAE: Variational Auto-Encoder for image encoding/decoding (Flux, Chroma, Qwen-Image, Z-Image).

  • CLIP-L: CLIP text encoder (SDXL, SD3, Flux).

  • CLIP-G: Second CLIP text encoder (SDXL, SD3).

  • T5-XXL: T5 text encoder (SD3, Flux, Chroma).

  • LLM: An LLM text encoder — Qwen2.5-VL for Qwen-Image, Qwen3-4B for Z-Image.

In the model's Components tab, Backend.AI GO shows the companion files each multi-file model requires, downloads any that are missing with one click, and validates that all required files are present before the model can load.

Shared companion store

Companion files are kept in a single shared store and are downloaded once, then reused across every model that needs them. For example, the VAE, CLIP-L, and T5-XXL encoders are shared across Flux, Chroma, and SD3 — once you download t5xxl_fp16.safetensors for one model, every other model that needs it reuses the same file instead of re-downloading. You can also point a model at a manually-placed companion file sitting next to it on disk.


Backend.AI GO enables local image generation through the innovative work of leejet and the stable-diffusion.cpp community.