12.9. CLI Reference¶

The aigo CLI tool provides command-line access to the Backend.AI GO Management API. Use this tool to manage local models, control inference servers, monitor system resources, and interact with loaded models from the terminal.

Installation¶

The CLI is included with the Backend.AI GO distribution. If you are building from source:

cd cli
cargo install --path .

Usage¶

aigo [OPTIONS] <COMMAND>

Auto-Discovery¶

When --endpoint is not specified, the CLI automatically discovers a running Backend.AI GO instance by reading a discovery file written by the Management API server at startup. No configuration is required for the most common case of connecting to a locally running instance.

Endpoint resolution order:

--endpoint flag or BACKEND_AI_GO_ENDPOINT environment variable (explicit override)
Config file endpoint (if changed from the default via aigo config set endpoint ...)
Auto-discovery file (if a local instance is running and healthy)
Default fallback: http://127.0.0.1:8001

Discovery file locations by OS:

macOS: ~/Library/Application Support/ai.backend.go/mgmt-api.json
Linux: $XDG_RUNTIME_DIR/ai.backend.go/mgmt-api.json (fallback: ~/.config/ai.backend.go/mgmt-api.json)
Windows: %APPDATA%\ai.backend.go\mgmt-api.json

Before connecting, the CLI validates the discovery file by checking that the server process (identified by PID) is still running and that the endpoint responds to a health check. Stale files from crashed instances are silently ignored.

Global Options¶

Option	Short	Environment Variable	Description
`--endpoint`	`-e`	`BACKEND_AI_GO_ENDPOINT`	Management API endpoint (URL or configured name). Overrides auto-discovery.
`--token`	`-t`	`BACKEND_AI_GO_TOKEN`	API authentication token.
`--output`	`-o`	`BACKEND_AI_GO_OUTPUT`	Output format: `console`, `json`, `yaml`.
`--quiet`	`-q`		Suppress non-essential output.
`--verbose`	`-v`		Enable verbose output.
`--no-verify-ssl`			Skip SSL certificate verification.

Commands¶

`chat` - One-Shot Chat Completion¶

Send a single message to a loaded model and print the response.

aigo chat [OPTIONS] [MESSAGE]

If MESSAGE is omitted, input is read from stdin (up to 1 MiB).

Options:

Option	Short	Description
`--model <MODEL>`	`-m`	Model to use for completion.
`--max-tokens <INT>`		Maximum tokens to generate (default: 1024).
`--temperature <FLOAT>`		Sampling temperature 0.0–2.0 (default: 0.7). Ignored when `--reasoning-effort` is set.
`--system <PROMPT>`	`-s`	System prompt to prepend.
`--reasoning-effort <LEVEL>`		Reasoning effort level for hybrid-thinking models. Accepted values: `none`, `low`, `medium`, `high`, `xhigh`. Use `none` to disable thinking mode via `chat_template_kwargs`.
`--no-think`		Disable thinking mode (sets `chat_template_kwargs.enable_thinking=false`). Takes precedence over `--reasoning-effort`.
`--thinking-budget <N>`		Per-request cap on tokens emitted inside the `<think>` block (sent as `thinking_budget_tokens` in the request body). `-1` = unlimited (engine default), `0` = immediate end (disables thinking), `N>0` = hard cap of N tokens. Engine-agnostic: works on both llama-server and mlxcel-server.
`--preserve-thinking`		Retain `<think>` blocks from all prior assistant turns instead of stripping them (Qwen3.6+ feature). Sets `chat_template_kwargs.preserve_thinking=true`. Orthogonal to `--no-think` / `--reasoning-effort` — both kwargs coexist when flags are combined. Older Qwen3/3.5 models accept the flag but behavior is unvalidated.

When --reasoning-effort is set to a level other than none, the request sends both reasoning_effort and chat_template_kwargs: {"enable_thinking": true}. When set to none, or when --no-think is passed, only chat_template_kwargs: {"enable_thinking": false} is sent, which is the correct way to suppress the <think> block on Qwen3/3.5 hybrid-thinking models.

--thinking-budget and --preserve-thinking are independent of --reasoning-effort: the budget caps how many tokens the model can emit inside <think>, and preserve_thinking controls whether prior <think> blocks survive in the prompt. Both fields travel in the per-request HTTP body, so they are forwarded unchanged to llama-server and mlxcel-server (and through the continuum-router passthrough path).

Examples:

# Basic chat
aigo chat "What is the capital of France?"

# Disable thinking mode on a Qwen3 model
aigo chat --no-think "Summarize this document" < report.txt

# Enable thinking with medium effort
aigo chat --reasoning-effort medium "Solve this step by step: ..."

# Cap thinking at 64 tokens (force concise reasoning)
aigo chat --thinking-budget 64 --reasoning-effort high "Quick: 2+2=?"

# Disable thinking via the budget (equivalent to --no-think for engines that implement it)
aigo chat --thinking-budget 0 "Just answer directly."

# Preserve <think> blocks across turns on Qwen3.6+ (improves agent KV cache reuse)
aigo chat --preserve-thinking --reasoning-effort high "Continue solving from where we left off."

# Pipe input with a system prompt
echo "SELECT * FROM users" | aigo chat --system "You are a SQL expert."

`complete` - One-Shot Text Completion¶

Send a prompt for text completion (non-chat format).

aigo complete [OPTIONS] [PROMPT]

If PROMPT is omitted, input is read from stdin.

Options:

Option	Short	Description
`--model <MODEL>`	`-m`	Model to use.
`--max-tokens <INT>`		Maximum tokens to generate (default: 256).
`--temperature <FLOAT>`		Sampling temperature 0.0–2.0 (default: 0.7).

`config` - Configuration Management¶

Manage CLI configuration settings.

aigo config path: Show configuration file path.
aigo config get <KEY>: Get a configuration value.
aigo config set <KEY> <VALUE>: Set a configuration value.
aigo config list: List all configuration values.
aigo config reset: Reset configuration to defaults.

`model` - Local Model Management¶

Manage models stored on the local disk.

aigo model list: List all local models.
aigo model info <MODEL_ID>: Get detailed information about a specific model.
aigo model refresh: Refresh the model index (scan for new files).

`loaded` - Loaded Model Operations¶

Control models currently loaded into memory for inference.

aigo loaded list: List currently loaded models.
aigo loaded info <ID>: Get details of a loaded model instance.
aigo loaded load [OPTIONS] <MODEL_ID>: Load a model into memory.
- Options:
  - -c, --context-length <INT>: Override context length.
  - -g, --gpu-layers <INT>: Number of layers to offload to GPU (-1 for all).
  - -t, --threads <INT>: Number of threads to use.
  - -a, --alias <STRING>: Model alias for routing.
  - --tool-calling: Enable tool calling capabilities.
  - --mmproj <PATH>: Path to mmproj file for vision models.
aigo loaded unload <ID>: Unload a model to free resources.
aigo loaded health <ID>: Check the health status of a loaded model.

`router` - Router Control¶

Manage the Continuum Router service.

aigo router status: Get the current status of the router.
aigo router start: Start the router service.
aigo router stop: Stop the router service.
aigo router restart: Restart the router service.
aigo router verify-endpoint --json <BODY>: Probe an Anthropic-compatible endpoint. The body (--json or --file) carries baseUrl, apiKey, and model; the API key is sent in the request only and never echoed.

`system` - System Monitoring¶

Monitor hardware resources and API status.

aigo system info: Get general system information (OS, Architecture).
aigo system metrics: Get current system metrics (CPU, RAM usage).
aigo system gpu: Get detailed GPU information.
aigo system health: Check the overall API health.
aigo system version: Get the API server version.

`extension` - Extension Skills and Imports¶

Manage Claude Code / Codex skills and import subagent / AGENTS.md definitions. Commands with complex bodies accept --json <STRING> or --file <PATH>.

aigo extension skill list: List installed skills.
aigo extension skill show <ID>: Show a skill.
aigo extension skill create --json <BODY>: Create a skill (ExtensionSkill shape).
aigo extension skill update <ID> --json <BODY>: Update a skill.
aigo extension skill delete <ID> [-y]: Delete a skill.
aigo extension skill enable <ID> / disable <ID>: Toggle a skill.
aigo extension skill invoke <ID> [--json <BODY>]: Render a skill for invocation (body defaults to {}).
aigo extension skill import-file <PATH>: Import a skill from a server-side file path.
aigo extension skill import-url <URL>: Import a skill from an https:// URL.
aigo extension skill parse --json <BODY>: Parse skill content (content/sourceHint).
aigo extension skill activation-preview --json <BODY>: Preview permission mapping for skill content.
aigo extension skill fork preview --json <BODY> / fork run --json <BODY>: Preview or run a context: fork skill.
aigo extension agent parse --json <BODY> / agent import --json <BODY>: Preview or import a subagent / AGENTS.md as an agent profile.
aigo extension discover: Discover importable Claude Code / Codex artifacts on disk.

`session` - Session Management¶

Manage inference and squad-agent sessions (the global Sessions surface; distinct from aigo squad session).

aigo session list: List active sessions.
aigo session show <ID>: Show a session.
aigo session terminate <ID> [-y]: Terminate an active session.
aigo session alias <ID> <ALIAS>: Rename the model alias of a running LLM-serving session.
aigo session diagnostics <ID>: Show a diagnostics snapshot.
aigo session history list: List terminated-session history.
aigo session history show <ID>: Show a history entry.
aigo session history delete <ID> [-y]: Delete a history entry.
aigo session history clear [-y]: Clear all history entries.

The live SSE tail (GET /api/v1/sessions/events) is not exposed as a CLI command; subscribe to the Management API event stream directly when needed.

`squad discussion` - Discussion Rooms¶

Discussion-room subcommands under the existing aigo squad group.

aigo squad discussion create --json <BODY>: Create a room (body carries squadId/topic).
aigo squad discussion list <SQUAD_ID> / list-completed <SQUAD_ID>: List rooms.
aigo squad discussion show <ID> / delete <ID> [-y]: Show or delete a room.
aigo squad discussion start|pause|resume|stop <ID>: Control the orchestrator.
aigo squad discussion post <ID> --message <TEXT>: Enqueue a message (or --json/--file).
aigo squad discussion cancel-message <ID> <MESSAGE_ID>: Cancel a queued message.
aigo squad discussion mode <ID> <MODE>: Set the mode (moderated/brainstorm).
aigo squad discussion strategy <ID> [STRATEGY]: Set or clear the strategy override (moderated/brainstorm/roundRobin/autonomous; omit or none/clear to reset).
aigo squad discussion turn-budget <ID> <N>: Set the turn budget.
aigo squad discussion conclude <ID> [--force]: Synthesize a conclusion.
aigo squad discussion handoff <ID>: Build a handoff request.
aigo squad discussion export <ID> [--format <FMT>]: Export the transcript (markdown/json/plainText).
aigo squad discussion analytics <ID>: Show discussion analytics.
aigo squad template install --path <PATH> [--source-id <ID>]: Install a squad template from the registry catalog.

`engine container` - Container Inference Engines¶

Containerized inference-engine sessions (vLLM / SGLang) under the existing aigo engine group.

aigo engine container start --json <BODY>: Start a session (body carries engineId/modelId/modelPath).
aigo engine container stop <SESSION_ID>: Stop a session.
aigo engine container readiness [--json <BODY>]: Show the readiness report.
aigo engine container logs <SESSION_ID>: Tail engine logs.
aigo engine container image inspect <ENGINE_ID>: Inspect an image (presence, size, reference).
aigo engine container image pull <ENGINE_ID> [--force]: Pull / update an image.
aigo engine container image remove <ENGINE_ID> [-y]: Remove an image.

`provider login` / `provider capabilities`¶

Codex OAuth device flow and capability detection under the existing aigo provider group.

aigo provider login start <PROVIDER_ID>: Start a device-flow login (returns verificationUri/userCode; tokens are stored server-side and never printed).
aigo provider login poll <LOGIN_SESSION_ID>: Poll the device-token endpoint once.
aigo provider login cancel <LOGIN_SESSION_ID>: Cancel a login session.
aigo provider login revoke <PROVIDER_ID>: Revoke stored tokens.
aigo provider capabilities show <ID>: Show cached provider capabilities.
aigo provider capabilities detect <ID> [--force]: Detect provider-level capabilities.
aigo provider capabilities detect-with-models <ID> [--force]: Detect provider plus per-model records.
aigo provider capabilities models <ID>: List cached per-model records.
aigo provider capabilities model <ID> <MODEL_ID>: Show one model's record.
aigo provider capabilities model-detect <ID> <MODEL_ID> [--force]: Detect one model's capabilities.
aigo provider capabilities override <ID> <MODEL_ID> --json <BODY>: Set a manual override (ModelCapabilityOverrideInput).

Examples¶

List all available models in JSON format:

aigo model list -o json

Load a model with custom GPU layers:

aigo loaded load "gemma-3n-E4B-it-Q4_K_M" --gpu-layers 33

Check system GPU status:

aigo system gpu

List installed extension skills as JSON:

aigo extension skill list -o json

Create and start a discussion room:

aigo squad discussion create --json '{"squadId":"sq-1","topic":"Release plan"}'
aigo squad discussion start <DISCUSSION_ID>

Start a containerized vLLM engine session:

aigo engine container start --json '{"engineId":"vllm","modelId":"org/model","modelPath":"/models/org/model"}'

12.9. CLI Reference¶

Installation¶

Usage¶

Auto-Discovery¶

Global Options¶

Commands¶

chat - One-Shot Chat Completion¶

complete - One-Shot Text Completion¶

config - Configuration Management¶

model - Local Model Management¶

loaded - Loaded Model Operations¶

router - Router Control¶

system - System Monitoring¶

extension - Extension Skills and Imports¶

session - Session Management¶

squad discussion - Discussion Rooms¶

engine container - Container Inference Engines¶

provider login / provider capabilities¶