2.3. Running Models & Your First Chat¶

Once you have downloaded a model, it's time to put it to work. Backend.AI GO provides a modern chat interface that feels familiar and responsive.

Loading a Model¶

Model list

Before you can chat, the model must be "loaded" from your disk into your computer's memory (RAM or VRAM).

Navigate to the Models tab.
Your downloaded models will appear here as cards.
Click the Load button on the model you want to use.
Advanced Settings: Before loading, you can click the settings icon on the model card to adjust parameters like Context Length and GPU Offloading.
- For a complete guide on all available options, see Model Settings & Parameters.
Watch the progress bar. Once it turns green and says "Loaded," you are ready!

Model Structure¶

Each model card includes a Model Structure viewer that provides a detailed breakdown of the model's internal architecture. Click the structure icon on a model card to open the modal.

Overview¶

Model structure overview

The overview section displays:

Model Overview: Architecture type (e.g., GEMMA3N), tensor count, and number of layers.
Quantization: Compression method (e.g., Q4KM), compression ratio, original vs. quantized bit depth, and quality-vs-size trade-off visualization.
Dimensions: Embedding size, vocabulary size, and context length shown as proportional bars.

Model Flow & Layer Stack¶

Model flow and layer stack

The model flow section visualizes:

Model Flow: The data pipeline from Input Tokens through Embedding, Transformer layers, Output, and Vocabulary.
Layer Stack: The layer hierarchy including Input Embedding, individual transformer layers, and Output Head.
KV Cache: Context capacity, estimated KV cache size, and memory estimation details (per-layer size, head dimensions, precision).

Transformer Layer Details¶

Transformer layer and attention details

Clicking on a transformer layer reveals:

Multi-Head Attention: Number of Q/V heads, KV heads, head dimension, and GQA ratio.
Grouped Query Attention (GQA): A visual diagram showing how query heads are grouped and share KV heads, helping you understand the model's attention efficiency.

Position Encoding & Normalization¶

Position encoding and normalization

This section shows the model's positional encoding and normalization parameters:

Position Encoding (RoPE): Explains how Rotary Position Embedding works, with a position-based rotation visualization and frequency spectrum display.
Normalization: RMS epsilon value used for layer normalization.

The Chat Interface¶

Click the Chat icon in the sidebar to enter the main interface.

Creating Conversations¶

New Chat: Click the "+" button in the sidebar to start a fresh conversation.
History: Your previous chats are automatically saved in the sidebar for easy access.
Search: Use the search bar in the sidebar to find past conversations by keyword.

Interaction Features¶

Markdown Support: The model can format responses with bold text, lists, and tables.
Code Highlighting: Programming code in responses is beautifully highlighted with a "Copy" button.
LaTeX Support: Mathematical formulas are rendered cleanly.
Thinking Blocks: Some models (like DeepSeek or specialized reasoning models) can show their internal "thinking" process. Backend.AI GO displays these in a dedicated collapsible block.

Understanding Chat Parameters¶

In the chat interface, you can find a "Parameters" drawer (usually a gear icon on the top right) to fine-tune the model's behavior:

Temperature: Controls "creativity." Lower (0.1) is more focused and predictable; higher (0.8+) is more creative and random.
Top P: Another way to control randomness.
Repeat Penalty: Prevents the model from getting stuck in a loop.
System Prompt: Give the model a "personality" or specific instructions (e.g., "You are a helpful coding assistant" or "Speak like a pirate").

Model Status in Header¶

When a model is loaded, the header displays a Model Status Pill showing:

Model Name: The display name of the currently loaded model
Memory Usage: How much RAM/VRAM the model is using (e.g., "2.3 GB")
Context Usage: A visual bar showing context token usage (e.g., "0/8K")

Click the status pill to open a detailed popover with:

Full Model Path: Where the model file is located on disk
Memory Details: Memory usage with a progress bar relative to system total
Context Details: Token usage with percentage
Load Time: When the model was loaded (relative time like "2 hours ago")
Uptime: How long the model has been running
Unload Model: Quickly free resources without navigating to the Models tab
Model Settings: Jump directly to model configuration

This provides a convenient way to monitor and manage your loaded model from anywhere in the application.

Unloading Models¶

When you are finished, or want to switch to a different model:

Go back to the Models tab.
Click Unload.
Alternatively, click the Model Status in the header and select Unload Model from the popover.

This frees up your system RAM/VRAM for other tasks.

Batch Operations¶

When you have many models, Backend.AI GO provides batch operations to manage multiple models at once.

Model management

Entering Selection Mode¶

Go to the Models tab.
Click the Select button in the page header to enter selection mode.
Model cards will now show checkboxes for selection.

Selecting Models¶

Click on a model card to toggle its selection.
Shift+Click to select a range of models (from the last selected model to the clicked one).
Cmd/Ctrl+Click (on macOS/Windows/Linux) to toggle individual model selection.
Use Select all to select all visible models.
Use Deselect all to clear your selection.

Batch Delete¶

Select the models you want to delete.
Click the Delete button in the floating action bar at the bottom.
A confirmation dialog will appear showing the list of models to be deleted.
Click Delete to confirm. A progress bar shows the deletion status.
If any deletions fail, an error summary is displayed.

Exiting Selection Mode¶

Click Exit selection or press Escape to leave selection mode and return to normal view.

Model Package Export and Import¶

Backend.AI GO supports a portable .baimodel package format that allows you to export and import models with all their metadata intact. This is useful for:

Transferring models between computers
Sharing models with colleagues
Backing up models with their configuration

Exporting a Model¶

Go to the Models tab.
Find the model you want to export.
Right-click (or long-press on touch devices) to open the context menu.
Select Export as Package.
In the export dialog:
- Review the model information and file sizes.
- For vision models, optionally include the mmproj (multimodal projector) file.
- Choose a save location for the .baimodel package.
Click Export to begin. A progress bar shows the packaging status.

The exported package contains:

The model file(s) in their original format
Package manifest with model metadata
SHA256 checksums for integrity verification

Importing a Package¶

Go to the Models tab.
Click the Import Package button in the header.
Select the .baimodel file you want to import.
The import dialog shows:
- Validation status of the package
- Model information (name, format, size)
- Any warnings or errors
Click Import to extract the package.
The model will be placed in your models directory and appear in the model list.

Package Features¶

Integrity Verification: SHA256 checksums are calculated during export and verified during import to ensure data integrity.
Security Checks: Packages are validated for path traversal attacks, symlinks, and ZIP bomb attempts.
Progress Tracking: Both export and import operations show detailed progress including phase, speed, and estimated time remaining.
Atomic Operations: Export uses atomic file writes to prevent partial packages on failure.