Skip to content

Frequently Asked Questions (FAQ)

1. General

What is Backend.AI GO?

Backend.AI GO is a desktop application that lets you run powerful AI models (LLMs) locally on your own computer. It provides a chat interface, model management, and agent capabilities without relying on cloud services.

Is it completely free?

Yes. The application itself is free and open-source (Apache 2.0). Downloading and running local models is also free. You only pay if you choose to connect paid cloud services like OpenAI or Anthropic.

Does it require an internet connection?

An internet connection is required to download models or use Cloud Integration features. Once a local model is downloaded, you can chat with it completely offline.

Is my data private?

Yes. When you use local models (like Gemma 3 or Qwen3), your chats and documents never leave your computer. They are processed entirely on your own CPU and GPU.

Exception: Cloud Integration

If you explicitly use "Cloud Integration" features (e.g., chatting with GPT-5.2 or Claude 4.5), your prompts and attached documents are sent to that specific provider's API for processing.

Can I use this for commercial work?

Yes, regarding the Backend.AI GO application itself. However, each AI model has its own license (e.g., Llama 3 Community License, Apache 2.0). Please check the license of the specific model you are using on Hugging Face.

Do I need to log in?

No. Backend.AI GO works locally without any account registration.

Is there a mobile app?

Currently, Backend.AI GO is a desktop-only application (Windows, macOS, Linux).

2. Installation & Updates

Which operating systems are supported?

  • Windows: Windows 10 (version 1809 or later) and Windows 11 (x64).
  • macOS: macOS 13 (Ventura) or later (Apple Silicon required).
  • Linux: Ubuntu 22.04+ and other major distributions (AppImage/Debian).

Does it update automatically?

Yes, the application checks for updates on startup and will notify you when a new version is available.

Can I change the installation directory?

On Windows, you can choose the install location during setup. On macOS, you can move the app to any folder.

Permission issues on Linux?

If using the AppImage, ensure you have given it execute permissions: chmod +x Backend.AI-GO.AppImage.

My antivirus flagged the app.

This can happen with new open-source software. Our builds are signed, but some strict heuristics might flag them. You can safely add an exception or submit a false positive report to your antivirus vendor.

How do I uninstall?

  • Windows: Use "Add or Remove programs".
  • macOS: Drag the app to Trash.
  • Linux: Delete the AppImage file.

Note: This does not delete your downloaded models or chat history. You must delete the data folder manually if you want a full cleanup.

3. Hardware & Performance

What are the minimum requirements?

  • RAM: 8GB (for small models like 7B Q4). 16GB+ is recommended.
  • Disk: At least 20GB free space for models.
  • OS: macOS 13 (Ventura) or later (Apple Silicon required), Windows 10/11, or Linux.

What graphics card do you recommend?

NVIDIA GPUs (RTX 5090, DGX Spark or better) or an AMD AI Max 395+ (Strix Halo) APU is highly recommended. More VRAM (12GB+) is better for larger models.

Do you support AMD GPUs?

Yes. On Linux, we support ROCm. On Windows, basic support is available via Vulkan, but NVIDIA/CUDA provides a smoother experience.

How is performance on Apple Silicon (M1-M5)?

Excellent. Backend.AI GO uses MLX and Metal, making Apple Silicon Macs one of the most efficient platforms for local AI. An M3 Max or M4 Pro is comparable to high-end desktop GPUs for inference.

Why is the AI generating text slowly?

Local inference speed depends heavily on your hardware.

  • PC: Using standard CPUs alone will be slow (2-5 tokens/sec). Use a GPU.
  • Tip: Try a smaller model (e.g., 4B/8B) or a more quantized version (Q4_K_M).

Can I run 70B, 100B+, or 200B+ models?

Yes, if you have enough RAM/VRAM.

  • Mac: An M-series Mac with 48GB+ Unified Memory is ideal. For models like Qwen3-235b-a22b, you will need 128GB+ RAM.
  • PC: You need about 40GB+ of VRAM (e.g., RTX Pro 6000, DGX Spark, or dual RTX 5090). Alternatively, AMD AI Max 395+ systems with 128GB unified memory can easily handle 70B and even 120B-235B models like Solar-Open-100B or gpt-oss-120B.

My laptop battery drains fast.

AI inference is computationally intensive. It's recommended to plug in your laptop for long sessions.

The fan noise is loud.

This is normal. The AI utilizes your GPU/CPU heavily, causing fans to spin up to cool the system.

4. Model Management

Which model should I download?

For the best experience, we recommend choosing a model based on your needs and hardware:

  • General Purpose: Qwen3-8B-Instruct or Gemma 3-4B-Instruct.
  • Coding: Qwen3-Coder-7B (state-of-the-art for its size) or Codestral-22B (highly optimized for 80+ languages).
  • Speed (Low Resource): Qwen3-4B or Gemma 3-4B.
  • High Performance (Heavy): gpt-oss-120B, Solar-Open-100B, or Qwen3-235b-a22b. These require high VRAM (32GB+) or 128GB+ RAM.
  • Extreme/Enterprise: GLM-4.7 or Kimi K2 1T. These massive models are best used by connecting to a Backend.AI Cluster.

What is the difference between GGUF and MLX?

  • GGUF: A universal format that runs on CPU and almost all GPUs. Best for Windows/Linux.
  • MLX: Optimized specifically for Apple Silicon Macs. Recommended for Mac users.

What is "Quantization" (Q4, Q8, FP16)?

Quantization reduces model precision to save memory.

  • Q4_K_M (4-bit): Recommended. Best balance of speed and quality.
  • Q8_0 (8-bit): Higher quality, 2x memory usage.
  • FP16: Original quality. Usually too big for consumer hardware.

Where are downloaded models stored?

By default, in your OS's application data folder. You can see and change this path in Settings > Storage.

Can I store models on an external drive?

Yes. Change the model storage path in Settings to a folder on your external drive.

Can I import my own .gguf files?

Yes. Go to the Models tab and click Import.

I deleted a model but space didn't free up.

Check your OS Trash/Recycle Bin. Also, ensure the model wasn't just removed from the list but actually deleted from disk.

5. Chat & Features

Can I chat in languages other than English?

Yes. Most modern models (Qwen, Llama 3, Gemma) support multiple languages including Korean, Spanish, French, etc.

Can it see images?

Yes. Backend.AI GO supports multimodal models (like Llama-3.2-Vision, Qwen-VL) and cloud vision models (GPT-5.2, Claude 4.5). Drag and drop an image into the chat to discuss it.

Can I summarize PDFs?

Currently, you can paste text from PDFs. Direct PDF file upload and parsing is a planned feature for the next update.

Where are my chat logs stored?

Chat history is stored locally in an individual JSON file for each conversation on your device. We do not have access to your history.

Can I export my chats?

Yes, you can export chats to Markdown or JSON format from the conversation menu.

Can I run code?

Yes. In Agent Mode, if you enable the Code Execution tool, the AI can write and run Python code to solve math problems or create charts.

Can it search the internet?

Yes. In Agent Mode, enable the Web Search tool. This allows the model to search Google/DuckDuckGo for real-time information.

Does it support Voice Mode?

Voice input/output is currently in experimental beta and will be released soon.

6. Agents & Tools

What is an Agent?

An Agent is an AI mode that can "do" things, not just "say" things. It plans steps and uses tools to accomplish goals.

Which models support Tool Calling?

Look for the "Tool" tag on the model card. Models like Gemma 3, Qwen3, and GPT-4o are best for this.

Will the agent delete my files? (Security)

Backend.AI GO has a Risk Permission System. Any dangerous action (like deleting a file) requires your explicit approval before it runs. The agent cannot bypass this.

Can I see the "Thinking" process?

Yes. For reasoning models (like DeepSeek-R1), a Thinking Block will appear in the chat. Click to expand and see the AI's internal monologue.

Can I create custom tools?

Currently, you can use the built-in tools. Support for custom Python/JS tools via plugins is coming in a future release.

The agent is stuck in a loop.

This can happen with weaker models. Click the Stop button and try rephrasing your request, or switch to a smarter model (e.g., from 7B to 70B or cloud model).

7. Cloud Integration

Where do I get an OpenAI API Key?

Visit the OpenAI Platform to sign up and generate a key.

Do I have to pay to use Cloud Models?

Yes. Backend.AI GO is free, but providers like OpenAI and Anthropic charge per token usage. You pay them directly; we do not take a cut.

Can I connect to my company's internal API?

Yes. Use the OpenAI Compatible provider option. You can point Backend.AI GO to any endpoint (e.g., http://internal-server:8000/v1) that supports the OpenAI format.

How do I connect to a remote vLLM server?

Add a vLLM provider and enter your server's IP address and port (e.g., http://192.168.1.50:8000/v1).

Are my API keys safe?

Yes. Keys are stored in your OS's secure keychain (macOS Keychain, Windows Credential Manager) and are never sent to our servers.

8. Troubleshooting

Model loading stuck forever.

Large models take time to load into RAM. If it takes >3 minutes, you might be out of RAM. Try a smaller model.

"OOM (Out Of Memory)" Error.

Your model is too big for your RAM/VRAM. Try a higher quantization (Q4_K_M), or a smaller parameter count (7B instead of 14B).

The model only speaks English.

Add a System Prompt: "You are a helpful assistant who speaks fluent [Your Language]."

I found a bug. Where do I report it?

Please open an issue on our GitHub Repository.

I want to contribute!

We welcome contributions! Check out our Contribution Guide on GitHub.