Skip to content

4.5. stable-diffusion.cpp Acceleration

stable-diffusion.cpp brings the power of local image generation to Backend.AI GO. Just as llama.cpp revolutionized local LLM inference, stable-diffusion.cpp enables efficient, high-quality image generation directly on your hardware.

What is stable-diffusion.cpp?

stable-diffusion.cpp is a pure C/C++ implementation of Stable Diffusion inference, optimized for consumer hardware.

The project brings the same philosophy as llama.cpp to image generation: minimal dependencies, maximum efficiency, and broad hardware compatibility.

Why is it Special?

1. Cross-Platform Efficiency

Like llama.cpp, stable-diffusion.cpp is a self-contained solution that works across Windows, macOS, and Linux without heavy dependencies on Python or large ML frameworks.

2. Hardware Acceleration

The project supports multiple acceleration backends:

Platform Backend Description
macOS Metal Optimized for Apple Silicon (M1-M5) with unified memory support
Windows CUDA High performance with NVIDIA GPUs
Windows Vulkan Cross-vendor GPU support (AMD, Intel, NVIDIA)
Linux CUDA / ROCm Support for NVIDIA and AMD GPUs
CPU AVX2 / AVX-512 Fallback for any machine

3. Memory Efficient

The implementation uses memory-mapped loading and efficient memory management, allowing you to run diffusion models even on systems with limited VRAM.

Supported Model Types

Backend.AI GO supports multiple diffusion model architectures through stable-diffusion.cpp:

Model Type Description Typical Size
SD 1.x Original Stable Diffusion (1.4, 1.5) ~2-4 GB
SD 2.x Stable Diffusion 2.0, 2.1 ~4-5 GB
SDXL Stable Diffusion XL ~6-7 GB
Flux Black Forest Labs Flux architecture ~12+ GB
Qwen-Image Alibaba's Qwen image generation Varies

Role in Backend.AI GO

Integration

Backend.AI GO bundles the sd-server binary as a sidecar process. When you load a diffusion model:

  1. The app spawns sd-server as a dedicated background process.

  2. It provides an OpenAI-compatible /v1/images/generations API endpoint.

  3. Multiple diffusion models can be loaded simultaneously (pool management with LRU eviction).

  4. The process is fully managed with health checks and automatic cleanup.

Port Management

The sd-server pool uses ports 39100-39119, separate from the llama-server pool. This allows running both text generation and image generation simultaneously without port conflicts.

Key Settings

When configuring image generation in Backend.AI GO:

  • GPU Layers: Controls how many layers are offloaded to GPU. Use -1 to offload everything for maximum speed.

  • Threads: Number of CPU threads to use. Auto-detected if not specified.

  • CFG Scale: Classifier-Free Guidance scale (typically 7.0-8.0). Higher values follow the prompt more closely.

  • Sampling Steps: More steps generally mean higher quality but slower generation. 20-30 is a good balance.

  • Sampler: The sampling algorithm (euler, eulera, dpm++2m, etc.). euler_a is a good default.

Supported Formats

Backend.AI GO accepts diffusion models in the following formats:

  • .safetensors: The recommended format. Safe, fast to load, and widely supported.

  • .ckpt: Legacy checkpoint format. Works but safetensors is preferred.

  • .gguf: Quantized format for reduced memory usage.

Model Components

Some models (especially SDXL and Flux) may require additional components:

  • VAE: Variational Auto-Encoder for image encoding/decoding.

  • CLIP-L: Text encoder for SDXL models.

  • T5-XXL: Text encoder for Flux models.

Backend.AI GO allows you to specify these optional components when loading a model.


Backend.AI GO enables local image generation through the innovative work of leejet and the stable-diffusion.cpp community.