7.1. Multi-Node Overview¶

Backend.AI GO is designed not just as a standalone application but as a node in a larger computational mesh. This architecture allows you to scale beyond the limits of your local hardware by connecting to other Backend.AI GO instances, Continuum Routers, or enterprise-grade Backend.AI Clusters.

Why Multi-Node?¶

Local inference is great for privacy and low latency, but it is constrained by your hardware (VRAM, compute power). Multi-node capabilities allow you to:

Access Larger Models: Run models that don't fit in your local GPU by offloading them to a server or cluster.
Scale Throughput: Distribute inference requests across multiple nodes to handle higher concurrency.
Centralized Management: Use Backend.AI GO as a client to manage and utilize resources from a powerful central server.

Node Types¶

In the Backend.AI GO ecosystem, there are three primary types of connections:

Local Node: Your current machine running Backend.AI GO. It runs its own Continuum Router and manages local models.
Peer Node (Backend.AI GO): Another computer running Backend.AI GO. You can connect to it directly to use its loaded models.
Backend.AI Cluster: An enterprise-grade cluster managed by Backend.AI Core. This provides massive scalability, user management, and security features.

How It Works¶

Backend.AI GO uses a Mesh Networking approach.

The Continuum Router: At the heart of every node is the Continuum Router. It acts as an API gateway that routes your prompts to the appropriate backend (local process, remote peer, or cloud API).
Unified API: Whether the model is running locally on your laptop or on an H100 cluster in a data center, Backend.AI GO treats it the same way. You simply select the model from the dropdown, and the system handles the routing.
Security: Connections to Clusters are secured using HMAC-SHA256 authentication (Access Key & Secret Key).

Real-Time Network Visualization¶

The Mesh tab provides an interactive, real-time visualization of your entire network topology. This visualization helps you understand how your Backend.AI GO instance connects to various backends and monitor their health status at a glance.

What You Can See¶

The network visualization displays a hierarchical, left-to-right flow diagram showing:

You → Router → Backends → Models

User Node: Represents you (the entry point for all requests)
Router Node: Your local Continuum Router acting as the API gateway
- Shows running status and uptime
- Displays request statistics (total, successful, failed)
Backend Nodes: All connected backends including:
- Backend.AI GO peers
- Continuum Routers
- Backend.AI Clusters
- Cloud providers (OpenAI, Anthropic, Gemini, etc.)
- Local backends (llama.cpp, MLX, vLLM, Ollama)
Model Nodes: Available models from each backend

Live Connection Status¶

The visualization updates automatically every 5 seconds, showing:

Visual Indicator	Meaning
Green connections	Healthy, active connections
Red connections	Unhealthy or failed connections
Gray connections	Disabled or inactive connections
Animated particles	Data flowing between nodes
Latency labels	Response time in milliseconds

Animated Data Flow¶

One of the most powerful features is the animated edge visualization:

Flowing particles show the direction of data travel
Green particles indicate healthy request/response flow
Red particles indicate problems in the connection
Glow effects highlight active connections

This makes it easy to see at a glance which connections are actively processing requests and which might need attention.

Interactive Controls¶

Pan and Zoom: Navigate large network topologies using scroll wheel or the control buttons
Node Selection: Click any node to select it and view detailed information
Details Panel: A side panel appears showing comprehensive node details
Path Highlighting: Selected nodes highlight all connected edges with a glow effect
Hover tooltips: Get quick information about any node
Refresh button: Manually trigger health checks for all connections
Legend: Quick reference for node and connection types
Keyboard Support: Press ESC to close the details panel

Node Details Panel¶

When you click on a node, a details panel slides in from the right side showing information specific to that node type:

User Node: Shows connection count to downstream nodes
Router Node: Displays status, uptime, and request statistics (total, successful, failed)
Backend Node: Shows type, URL, health status, enabled state, latency, and model count; errors are displayed if the connection is unhealthy
Model Node: Displays model ID and which backend it belongs to

The panel includes:

Close button or press ESC to dismiss
Connected path count showing how many edges link to the selected node
Automatic focus management for accessibility

Connection Health Monitoring¶

Each backend node displays:

Health status badge: Healthy, unhealthy, or unknown
Latency metrics: Real-time response time
Model count: Number of available models
Connection type icon: Visual indicator of backend type

You can also perform manual health checks on individual connections or check all connections at once.

Getting Started¶

To expand your compute capabilities, you can:

Manually Register a remote node or cluster.
Use Auto-Discovery to find resources (services) available on connected nodes.