10.4. Frequently Asked Questions (FAQ)¶
1. General¶
What is Backend.AI GO?¶
Backend.AI GO is a desktop application that lets you run powerful AI models (LLMs) locally on your own computer. It provides a chat interface, model management, and agent capabilities without relying on cloud services.
Is it completely free?¶
Yes. The application itself is free and open-source (Apache 2.0). Downloading and running local models is also free. You only pay if you choose to connect paid cloud services like OpenAI or Anthropic.
Does it require an internet connection?¶
An internet connection is required to download models or use Cloud Integration features. Once a local model is downloaded, you can chat with it completely offline.
Is my data private?¶
Yes. When you use local models (like Gemma 3 or Qwen3), your chats and documents never leave your computer. They are processed entirely on your own CPU and GPU.
Exception: Cloud Integration
If you explicitly use "Cloud Integration" features (e.g., chatting with GPT-5.2 or Claude 4.5), your prompts and attached documents are sent to that specific provider's API for processing.
Can I use this for commercial work?¶
Yes, regarding the Backend.AI GO application itself. However, each AI model has its own license (e.g., Llama 3 Community License, Apache 2.0). Please check the license of the specific model you are using on Hugging Face.
Do I need to log in?¶
No. Backend.AI GO works locally without any account registration.
Is there a mobile app?¶
Currently, Backend.AI GO is a desktop-only application (Windows, macOS, Linux).
2. Installation & Updates¶
Which operating systems are supported?¶
- Windows: Windows 10 (version 1809 or later) and Windows 11 (x64).
- macOS: macOS 13 (Ventura) or later (Apple Silicon required).
- Linux: Ubuntu 22.04+ and other major distributions (AppImage/Debian).
Does it update automatically?¶
Yes, the application checks for updates on startup and will notify you when a new version is available.
Can I change the installation directory?¶
On Windows, you can choose the install location during setup. On macOS, you can move the app to any folder.
Permission issues on Linux?¶
If using the AppImage, ensure you have given it execute permissions: chmod +x Backend.AI-GO.AppImage.
My antivirus flagged the app.¶
This can happen with new open-source software. Our builds are signed, but some strict heuristics might flag them. You can safely add an exception or submit a false positive report to your antivirus vendor.
macOS says the app is from an "unidentified developer".¶
This is a standard macOS Gatekeeper warning for apps downloaded outside the App Store. Go to System Settings > Privacy & Security, find the blocked app message, and click "Open Anyway". For detailed steps, see the Troubleshooting Guide.
macOS shows "App is damaged and can't be opened".¶
This message can appear when macOS quarantines a downloaded file. Open Terminal and run:
Then try opening the app again. For more details, see the Troubleshooting Guide.
Which binary should I download for my Mac — Apple Silicon or Intel?¶
Backend.AI GO requires Apple Silicon (M1 or later). There is no Intel Mac version. If you are on an Intel-based Mac, use the Cloud Integration features to connect to a remote inference endpoint instead of running models locally.
Windows shows "Windows protected your PC" when I try to install.¶
This is the Windows SmartScreen warning. Click "More info", then click "Run anyway" to proceed with the installation. The app is safe; SmartScreen flags new software that has not yet built a reputation score with Microsoft.
My antivirus removed or quarantined files after installation on Windows.¶
The inference engine binary (llama-server-x86_64-pc-windows-msvc.exe) may be flagged as a false positive by some antivirus software. To resolve this:
- Open your antivirus software and restore the quarantined file if available.
- Add an exception for the Backend.AI GO installation directory:
%LOCALAPPDATA%\Programs\backend.ai.go\%LOCALAPPDATA%\backend.ai.go\engines\%APPDATA%\backend.ai.go\
- Reinstall the application after adding the exception.
You can also report the false positive to your antivirus vendor to help improve future detections.
The app installs but crashes on Windows — "VCRUNTIME140.dll not found".¶
Backend.AI GO requires the Microsoft Visual C++ Redistributable. Download and install it from the Microsoft Download Center.
GPU acceleration is not working on Windows.¶
For NVIDIA GPU acceleration:
- Install the latest NVIDIA Game Ready or Studio drivers from nvidia.com/drivers.
- CUDA 12.x is required. You can verify with
nvidia-smiin a command prompt. - If using an older driver, update it and restart the application.
The AppImage does not run on Linux.¶
Two common causes:
- Missing execute permission: Run
chmod +x Backend.AI-GO.AppImagebefore launching. - Missing FUSE: Many distributions require FUSE 2 to run AppImages. Install it with:
The app fails to start on Linux with "libwebkit2gtk" or similar errors.¶
Backend.AI GO depends on WebKitGTK. Install the missing libraries:
# Ubuntu / Debian
sudo apt install libwebkit2gtk-4.1-0 libgtk-3-0
# Fedora
sudo dnf install webkit2gtk4.1 gtk3
GPU acceleration is not working on Linux.¶
For NVIDIA GPUs, ensure the CUDA driver is installed and nvidia-smi returns output. For AMD GPUs with ROCm:
- Install ROCm from rocm.docs.amd.com.
- Verify with
rocminfoafter installation.
How do I uninstall?¶
- Windows: Use "Add or Remove programs".
- macOS: Drag the app to Trash.
- Linux: Delete the AppImage file.
Note: This does not delete your downloaded models or chat history. You must delete the data folder manually if you want a full cleanup.
3. Hardware & Performance¶
What are the minimum requirements?¶
- RAM: 8GB (for small models like 7B Q4). 16GB+ is recommended.
- Disk: At least 20GB free space for models.
- OS: macOS 13 (Ventura) or later (Apple Silicon required), Windows 10/11, or Linux.
What graphics card do you recommend?¶
NVIDIA GPUs (RTX 5090, DGX Spark or better) or an AMD AI Max 395+ (Strix Halo) APU is highly recommended. More VRAM (12GB+) is better for larger models.
Do you support AMD GPUs?¶
Yes. On Linux, we support ROCm. On Windows, basic support is available via Vulkan, but NVIDIA/CUDA provides a smoother experience.
How is performance on Apple Silicon (M1-M5)?¶
Excellent. Backend.AI GO uses MLX and Metal, making Apple Silicon Macs one of the most efficient platforms for local AI. An M3 Max or M4 Pro is comparable to high-end desktop GPUs for inference.
Why is the AI generating text slowly?¶
Local inference speed depends heavily on your hardware.
- PC: Using standard CPUs alone will be slow (2-5 tokens/sec). Use a GPU.
- Tip: Try a smaller model (e.g., 4B/8B) or a more quantized version (
Q4_K_M).
Can I run 70B, 100B+, or 200B+ models?¶
Yes, if you have enough RAM/VRAM.
- Mac: An M-series Mac with 48GB+ Unified Memory is ideal. For models like Qwen3-235b-a22b, you will need 128GB+ RAM.
- PC: You need about 40GB+ of VRAM (e.g., RTX Pro 6000, DGX Spark, or dual RTX 5090). Alternatively, AMD AI Max 395+ systems with 128GB unified memory can easily handle 70B and even 120B-235B models like Solar-Open-100B or gpt-oss-120B.
My laptop battery drains fast.¶
AI inference is computationally intensive. It's recommended to plug in your laptop for long sessions.
The fan noise is loud.¶
This is normal. The AI utilizes your GPU/CPU heavily, causing fans to spin up to cool the system.
4. Model Management¶
Which model should I download?¶
For the best experience, we recommend choosing a model based on your needs and hardware:
- General Purpose:
Qwen3-8B-InstructorGemma 3-4B-Instruct. - Coding:
Qwen3-Coder-7B(state-of-the-art for its size) orCodestral-22B(highly optimized for 80+ languages). - Speed (Low Resource):
Qwen3-4BorGemma 3-4B. - High Performance (Heavy):
gpt-oss-120B,Solar-Open-100B, orQwen3-235b-a22b. These require high VRAM (32GB+) or 128GB+ RAM. - Extreme/Enterprise:
GLM-4.7orKimi K2 1T. These massive models are best used by connecting to a Backend.AI Cluster.
What is the difference between GGUF and MLX?¶
- GGUF: A universal format that runs on CPU and almost all GPUs. Best for Windows/Linux.
- MLX: Optimized specifically for Apple Silicon Macs. Recommended for Mac users.
What is "Quantization" (Q4, Q8, FP16)?¶
Quantization reduces model precision to save memory.
Q4_K_M(4-bit): Recommended. Best balance of speed and quality.Q8_0(8-bit): Higher quality, 2x memory usage.- FP16: Original quality. Usually too big for consumer hardware.
Where are downloaded models stored?¶
By default, in your OS's application data folder. You can see and change this path in Settings > Storage.
Can I store models on an external drive?¶
Yes. Change the model storage path in Settings to a folder on your external drive.
Can I import my own .gguf files?¶
Yes. Go to the Models tab and click Import.
I deleted a model but space didn't free up.¶
Check your OS Trash/Recycle Bin. Also, ensure the model wasn't just removed from the list but actually deleted from disk.
5. Chat & Features¶
Can I chat in languages other than English?¶
Yes. Most modern models (Qwen, Llama 3, Gemma) support multiple languages including Korean, Spanish, French, etc.
Can it see images?¶
Yes. Backend.AI GO supports multimodal models (like Llama-3.2-Vision, Qwen-VL) and cloud vision models (GPT-5.2, Claude 4.5). Drag and drop an image into the chat to discuss it.
Can I summarize PDFs?¶
Currently, you can paste text from PDFs. Direct PDF file upload and parsing is a planned feature for the next update.
Where are my chat logs stored?¶
Chat history is stored locally in an individual JSON file for each conversation on your device. We do not have access to your history.
Can I export my chats?¶
Yes, you can export chats to Markdown or JSON format from the conversation menu.
Can I run code?¶
Yes. In Cowork, if you enable the Code Execution tool, the AI can write and run Python code to solve math problems or create charts.
Can it search the internet?¶
Yes. In Cowork, enable the Web Search tool. This allows the model to search the web for real-time information.
Does it support Voice Mode?¶
Voice input/output is currently in experimental beta and will be released soon.
6. Agents & Tools¶
What is an Agent?¶
An Agent is an AI mode that can "do" things, not just "say" things. It plans steps and uses tools to accomplish goals.
Which models support Tool Calling?¶
Look for the "Tool" tag on the model card. Models like Gemma 3, Qwen3, and GPT-4o are best for this.
Will the agent delete my files? (Security)¶
Backend.AI GO has a Risk Permission System. Any dangerous action (like deleting a file) requires your explicit approval before it runs. The agent cannot bypass this.
Can I see the "Thinking" process?¶
Yes. For reasoning models (like DeepSeek-R1), a Thinking Block will appear in the chat. Click to expand and see the AI's internal monologue.
Can I create custom tools?¶
Currently, you can use the built-in tools. Support for custom Python/JS tools via plugins is coming in a future release.
The agent is stuck in a loop.¶
This can happen with weaker models. Click the Stop button and try rephrasing your request, or switch to a smarter model (e.g., from 7B to 70B or cloud model).
7. Cloud Integration¶
Where do I get an OpenAI API Key?¶
Visit the OpenAI Platform to sign up and generate a key.
Do I have to pay to use Cloud Models?¶
Yes. Backend.AI GO is free, but providers like OpenAI and Anthropic charge per token usage. You pay them directly; we do not take a cut.
Can I connect to my company's internal API?¶
Yes. Use the OpenAI Compatible provider option. You can point Backend.AI GO to any endpoint (e.g., http://internal-server:8000/v1) that supports the OpenAI format.
How do I connect to a remote vLLM server?¶
Add a vLLM provider and enter your server's IP address and port (e.g., http://192.168.1.50:8000/v1).
Are my API keys safe?¶
Yes. Keys are stored in your OS's secure keychain (macOS Keychain, Windows Credential Manager) and are never sent to our servers.
8. Troubleshooting¶
Model loading stuck forever.¶
Large models take time to load into RAM. If it takes >3 minutes, you might be out of RAM. Try a smaller model.
"OOM (Out Of Memory)" Error.¶
Your model is too big for your RAM/VRAM. Try a higher quantization (Q4_K_M), or a smaller parameter count (7B instead of 14B).
The model only speaks English.¶
Add a System Prompt: "You are a helpful assistant who speaks fluent [Your Language]."
I found a bug. Where do I report it?¶
Please open an issue on our GitHub Repository.
I want to contribute!¶
We welcome contributions! Check out our Contribution Guide on GitHub.
9. Accessibility¶
Does Backend.AI GO support text zoom for visually impaired users?¶
Yes. Backend.AI GO is designed to support WCAG 2.1 Level AA compliance for text resizing (1.4.4 Resize text). The UI uses relative units (rem) that scale properly when you increase your browser or system text zoom up to 200%. All essential content remains accessible without horizontal scrolling.
Can I use keyboard navigation?¶
Yes. All interactive elements in the UI are keyboard accessible. You can use Tab to navigate between elements, Enter or Space to activate buttons, and Escape to close dialogs and drawers.
Does it support screen readers?¶
Backend.AI GO uses semantic HTML and ARIA attributes to improve compatibility with screen readers. We continue to improve accessibility support in each release.
How do I increase the text size?¶
You can increase text size using your operating system's accessibility settings or by zooming in the application window. The UI is designed to remain fully functional at up to 200% zoom without requiring horizontal scrolling.