Skip to content

CLI Reference

The backend-ai-go CLI tool provides command-line access to the Backend.AI GO Management API. Use this tool to manage local models, control inference servers, monitor system resources, and interact with loaded models from the terminal.

Installation

The CLI is included with the Backend.AI GO distribution. If you are building from source:

cd cli
cargo install --path .

Usage

backend-ai-go [OPTIONS] <COMMAND>

Global Options

Option Short Environment Variable Description
--endpoint -e BACKEND_AI_GO_ENDPOINT Management API endpoint (URL or configured name).
--token -t BACKEND_AI_GO_TOKEN API authentication token.
--output -o BACKEND_AI_GO_OUTPUT Output format: console, json, yaml.
--quiet -q Suppress non-essential output.
--verbose -v Enable verbose output.
--no-verify-ssl Skip SSL certificate verification.

Commands

config - Configuration Management

Manage CLI configuration settings.

  • backend-ai-go config path: Show configuration file path.
  • backend-ai-go config get <KEY>: Get a configuration value.
  • backend-ai-go config set <KEY> <VALUE>: Set a configuration value.
  • backend-ai-go config list: List all configuration values.
  • backend-ai-go config reset: Reset configuration to defaults.

model - Local Model Management

Manage models stored on the local disk.

  • backend-ai-go model list: List all local models.
  • backend-ai-go model info <MODEL_ID>: Get detailed information about a specific model.
  • backend-ai-go model refresh: Refresh the model index (scan for new files).

loaded - Loaded Model Operations

Control models currently loaded into memory for inference.

  • backend-ai-go loaded list: List currently loaded models.
  • backend-ai-go loaded info <ID>: Get details of a loaded model instance.
  • backend-ai-go loaded load [OPTIONS] <MODEL_ID>: Load a model into memory.
    • Options:
      • -c, --context-length <INT>: Override context length.
      • -g, --gpu-layers <INT>: Number of layers to offload to GPU (-1 for all).
      • -t, --threads <INT>: Number of threads to use.
      • -a, --alias <STRING>: Model alias for routing.
      • --tool-calling: Enable tool calling capabilities.
      • --mmproj <PATH>: Path to mmproj file for vision models.
  • backend-ai-go loaded unload <ID>: Unload a model to free resources.
  • backend-ai-go loaded health <ID>: Check the health status of a loaded model.

router - Router Control

Manage the Continuum Router service.

  • backend-ai-go router status: Get the current status of the router.
  • backend-ai-go router start: Start the router service.
  • backend-ai-go router stop: Stop the router service.
  • backend-ai-go router restart: Restart the router service.

system - System Monitoring

Monitor hardware resources and API status.

  • backend-ai-go system info: Get general system information (OS, Architecture).
  • backend-ai-go system metrics: Get current system metrics (CPU, RAM usage).
  • backend-ai-go system gpu: Get detailed GPU information.
  • backend-ai-go system health: Check the overall API health.
  • backend-ai-go system version: Get the API server version.

Examples

List all available models in JSON format:

backend-ai-go model list -o json

Load a model with custom GPU layers:

backend-ai-go loaded load "gemma-3n-E4B-it-Q4_K_M" --gpu-layers 33

Check system GPU status:

backend-ai-go system gpu