9.7. Team AI with Multi-Node¶

When one machine is not enough—whether your team has grown, you need GPU diversity, or you want fault tolerance—Backend.AI GO's multi-node clustering lets you pool several machines into a single AI infrastructure. Team members connect to any node and access all models across the cluster, without worrying about which physical machine hosts them.

This guide walks through setting up a multi-node cluster, configuring external access, and managing the cluster for team-wide use.

Why Multi-Node?¶

Single-node setup	Multi-node cluster
One GPU, one machine	Pool GPUs across multiple machines
One user at a time (or queued)	Concurrent access for the whole team
Single point of failure	Fault tolerance — other nodes keep running
Limited to one hardware type	Mix hardware: Apple Silicon + NVIDIA + AMD

Prerequisites¶

Before you begin, make sure you have:

Two or more machines with Backend.AI GO installed
All machines on the same local network (or reachable via VPN)
At least one model downloaded on each node (nodes can host different models)
Network firewall rules allowing traffic on the API port (default: 8080) and discovery port (default: 5353 for mDNS)

Architecture Overview¶

A multi-node cluster consists of peer nodes that discover each other automatically:

┌──────────────┐    ┌──────────────┐    ┌──────────────┐
│   Node A      │    │   Node B      │    │   Node C      │
│   (Mac Studio)│    │   (Linux GPU) │    │   (Windows)   │
│   Llama 3 8B  │◄──►│   Mistral 7B  │◄──►│   Phi-3       │
│   MLX backend │    │   CUDA backend│    │   CUDA backend│
└──────────────┘    └──────────────┘    └──────────────┘
        ▲                   ▲                   ▲
        └───────────────────┴───────────────────┘
                    mDNS Discovery
                    Continuum Router

Node Discovery: Nodes find each other using mDNS (Bonjour/Avahi) on the local network.
Continuum Router: Each node's router maintains a view of all available models across the cluster and routes requests to the appropriate node.
No Central Server: The cluster is fully peer-to-peer. Any node can be added or removed without reconfiguring others.

Setting Up a Multi-Node Cluster¶

Step 1: Enable External Access on Each Node¶

By default, Backend.AI GO binds its API server to localhost only. To allow other nodes and team members to connect, enable external access:

Open Settings > Advanced.
Enable Allow External Connections.
Set the API Port (default: 8080). Use the same port on all nodes for consistency.
(Optional) Set an API Key for authentication. All nodes in the cluster should use the same key.

Security

Enabling external access exposes the API server on your network. Always set an API key in team environments. For access beyond the local network, use a VPN rather than exposing the port to the public internet.

Step 2: Enable Node Discovery¶

Node discovery allows machines to find each other automatically.

Open Settings > Nodes.
Enable Node Discovery (mDNS).
The node will begin broadcasting its presence and listening for other nodes.

Within a few seconds, discovered nodes appear in the Settings > Nodes tab.

Step 3: Verify the Cluster¶

Once two or more nodes have discovery enabled:

Open Settings > Nodes on any node.
You should see all connected nodes listed with:
- Node name and IP address
- Available models on each node
- Hardware information (GPU type, memory)
- Connection status (online/offline)
Open the Model list — models from all nodes should be visible and marked with their host node.

Different Models on Different Nodes

You don't need the same model on every node. A common pattern is to host different models on each machine so the team has access to a wider variety (e.g., a large coding model on the GPU server, a fast chat model on the Mac Mini).

Step 4: Connect Team Members¶

Team members can connect to the cluster in two ways:

Option A: Run Backend.AI GO on their machine

Install Backend.AI GO on the team member's machine.
Enable Node Discovery — the cluster nodes will appear automatically.
The Continuum Router handles model routing transparently.

Option B: Use the OpenAI-compatible API directly

Team members can connect any OpenAI-compatible client to any node's API endpoint:

curl http://<node-ip>:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <api-key>" \
  -d '{
    "model": "llama-3-8b-instruct",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

The request is automatically routed to the node hosting the requested model.

Managing the Cluster¶

Load Balancing¶

When the same model is loaded on multiple nodes, the Continuum Router distributes requests across them. This provides:

Higher throughput: Multiple requests handled in parallel
Fault tolerance: If one node goes offline, requests route to remaining nodes

Adding a New Node¶

Install Backend.AI GO on the new machine.
Download the desired models.
Enable External Access and Node Discovery in Settings.
The node automatically joins the cluster — no restart needed on existing nodes.

Removing a Node¶

Simply close Backend.AI GO or disable Node Discovery on the machine. Other nodes detect the departure within seconds and update their routing tables.

Monitoring¶

Each node's Dashboard shows:

Cluster-wide model availability
Per-node resource usage (CPU, GPU, memory)
Request routing statistics
Node health status

Network Configuration¶

Firewall Rules¶

Ensure the following ports are open between cluster nodes:

Port	Protocol	Purpose
`8080` (or custom)	TCP	API server (model inference)
`5353`	UDP	mDNS node discovery

VPN Considerations¶

If your team is distributed across different networks:

Use a VPN (such as WireGuard or Tailscale) to place all nodes on a virtual LAN.
mDNS discovery works within VPN subnets that support multicast.
Alternatively, manually add nodes using a connection key: generate a connection key (bago:// URL or QR code) on the host node from Settings > Nodes, then register it on the connecting node.

Tailscale

Tailscale creates a flat network between all your devices with zero configuration for firewall rules. It supports mDNS forwarding, making it an excellent choice for distributed Backend.AI GO clusters.

Security Considerations¶

Concern	Recommendation
Unauthorized access	Set a strong API key on all nodes
Data in transit	Use a VPN for encrypted communication between nodes
Node impersonation	Keep the cluster on trusted networks only
Model access control	Currently all models are shared with all cluster members. Restrict at the network level if needed.

Not for Public Internet

The multi-node feature is designed for trusted local networks and VPNs. Do not expose Backend.AI GO ports directly to the public internet without additional security layers (reverse proxy with TLS, IP allowlists, etc.).

Example: Team Setup for a Startup¶

A startup with 5 developers sets up team AI:

Hardware:

Node A — Mac Studio M2 Ultra (192 GB) → hosts Llama 3 70B (MLX, 4-bit)
Node B — Linux workstation with RTX 4090 → hosts CodeLlama 34B (CUDA)
Node C — Mac Mini M4 (32 GB) → hosts Phi-3 Mini (fast, for quick tasks)

Setup:

All three machines on the office network.
Each machine has Backend.AI GO installed with External Access and Node Discovery enabled.
API key team-secret-key-2024 set on all nodes.
Developers run Backend.AI GO on their laptops — all three nodes and their models appear automatically.

Usage:

Developers use Llama 3 70B for complex reasoning tasks.
CodeLlama 34B handles code generation and review.
Phi-3 Mini serves quick chat and summarization requests with minimal latency.
If Node B goes down for maintenance, code generation requests automatically fail over to the next available node.

Tips¶

Specialize nodes by hardware. Run large models on powerful GPUs and fast small models on energy-efficient hardware.
Use the same API key across all nodes for seamless authentication.
Monitor resource usage on the Dashboard to identify bottlenecks and plan capacity.
Download models during off-hours. Large model downloads consume bandwidth that could affect cluster performance.
Keep Backend.AI GO updated to the same version on all nodes for compatibility.

Troubleshooting¶

Problem	Solution
Nodes do not discover each other	Verify mDNS port (5353/UDP) is open. Check that all nodes are on the same subnet. Some corporate networks block multicast — use connection keys for manual node registration as a fallback.
"Connection refused" when accessing a node	Ensure External Access is enabled on the target node and the API port is open in the firewall.
Model requests route to the wrong node	The Continuum Router selects based on model availability. If a model exists on multiple nodes, routing is automatic. Check that the model name matches exactly.
Slow performance with many users	Add more nodes with the same popular models to enable load balancing. Consider using a faster model for high-frequency tasks.
Node appears offline intermittently	Check network stability. Wireless connections can cause discovery flapping — prefer wired connections for cluster nodes.

Headless Mode — Run Backend.AI GO as a background service for dedicated serving
Cluster Integration — Connect to Backend.AI Enterprise clusters
Building Apps with the API — Use the OpenAI-compatible API endpoint
Settings — Configure network and access settings