2.7. Memory¶
The Memory system lets you store persistent facts, preferences, and context that the AI references across every conversation. Unlike the ephemeral context window that resets each session, memory entries survive indefinitely—so the AI "remembers" your project conventions, personal preferences, or domain knowledge without you repeating them.
What is Memory?¶
Large language models normally have no long-term recall. Each conversation starts from a blank slate, limited to whatever fits inside the model's context window. Memory changes that by maintaining a local store of facts that Backend.AI GO automatically injects into every chat session.
Think of it as a personal knowledge base that travels with your AI assistant.
Key Concepts¶
-
Memory Banks (Namespaces) — Organized groups that hold related entries. For example, you might have a "Work Projects" bank and a "Coding Style" bank. Each bank can be individually enabled or disabled, so you control exactly which context the AI sees.
-
Memory Entries — Individual facts stored inside a bank. Each entry has content text, optional tags for categorization, a source type (Manual or Auto), and timestamps.
-
Memory Injection — When you send a message, Backend.AI GO collects entries from all enabled banks, formats them into a system message, and prepends it to the API request. The model reads this context before answering.
-
Auto-Extraction — The AI can automatically identify useful facts from your conversations and save them as new entries, so your memory grows organically without manual effort.
-
Deduplication & Consolidation — Over time, auto-extracted entries may overlap. The system can merge similar entries and prune stale ones to keep your memory concise.
Getting Started¶
-
Open Settings > Memory and ensure the memory system is enabled (it is on by default).
-
Create your first memory bank by clicking New memory bank. Give it a descriptive name like "Personal Preferences" or "Project Notes".
-
Add entries manually, or simply chat with the AI and let auto-extraction populate the bank over time.
Memory Indicator¶
When memory is active, a database icon appears in the chat header next to the tool selector.
-
Badge Count — The icon displays the number of enabled memory banks.
-
Popover — Click the database icon to open a quick-access panel showing:
- A toggle to enable or disable memory injection for conversations.
- A list of active banks with their entry counts.
- Token usage showing how much of the context budget is consumed.
- A button to open the full Memory Viewer.
Hiding the Indicator
If you prefer a cleaner chat header, you can hide the database icon in Settings > Memory > Show memory indicator without disabling the memory system itself.
How Memory Injection Works¶
When you send a message with memory enabled:
-
Backend.AI GO collects entries from all enabled banks.
-
Entries are formatted into a structured block and prepended as a system message in the API request. This message is not visible in the chat UI—only the model sees it.
-
The system respects your configured Max tokens budget (default 2000). If the total entries exceed the budget, entries are truncated to fit.
-
If building the context fails for any reason (e.g., backend unavailable), the conversation proceeds normally without memory. This graceful degradation ensures chat always works.
Saving Messages to Memory¶
You can save any chat message directly to your memory:
-
Right-click on a message (or click the three-dot menu) and select Save to Memory.
-
In the dialog:
- Review or edit the content to save.
- Select a target bank (namespace) from the dropdown.
- Optionally add comma-separated tags for organization.
-
Click Save. The entry is immediately available for future memory injection.
Memory Must Be Enabled
The "Save to Memory" option only appears in the context menu when the memory system is enabled in Settings > Memory.
Managing Memory Banks¶

Memory banks are managed from the Settings > Memory tab:
-
Create — Add a new bank with a name and optional description.

-
Edit — Modify the name or description of an existing bank.
-
Delete — Permanently remove a bank and all its entries. This action cannot be undone.
-
Enable / Disable — Toggle individual banks on or off without deleting them. Disabled banks are excluded from memory injection.
-
Import / Export — Export a bank as a JSON file for backup or sharing, or import one from a JSON file (up to 10 MB).
Memory Viewer¶

The Memory Viewer is a side drawer for browsing, searching, and managing individual entries within a bank.
Opening the Viewer¶
Click the View button on any bank card in Settings > Memory, or click the button inside the Memory Indicator popover.
Browsing Entries¶
-
Entries are sorted by most recently updated.
-
Each entry card shows the content (with Markdown rendering), a source badge (Manual or Auto), tags prefixed with
#, and a relative timestamp. -
Long entries are collapsed to the first six lines by default. Click the expand button to reveal the full content.
Adding, Editing, and Deleting¶
-
Add — Click the Add button to create a new entry with content, tags, and a source type.
-
Edit — Click the edit icon on an entry card to modify its content and tags.
-
Delete — Click the delete icon to remove an entry (a confirmation dialog appears first).
-
Clear Bank — Click Clear Namespace in the footer to remove all entries from the current bank at once (with confirmation).
Searching¶
Use the search bar at the top to filter entries by content across all banks or within a specific one.
Auto-Extraction¶

Auto-extraction lets the AI identify and store useful facts from your conversations automatically.
How It Works¶
-
After every few assistant responses (default: every 5), the system checks whether new information worth remembering has appeared.
-
Recent conversation turns (up to 10 by default) are sent to the inference server along with existing entries for deduplication context.
-
The AI decides which facts to create as new entries or update in existing ones.
-
New entries are stored in a dedicated "Auto-extracted" bank (created automatically if it does not exist).
Safety Limits¶
-
A maximum of 20 extraction actions per cycle prevents runaway entry creation.
-
Auto-extracted entry content is capped at 10,000 characters; manual entries allow up to 50,000.
-
Each entry can have at most 10 tags.
Deduplication & Consolidation¶
Over time, auto-extracted entries may contain overlapping or redundant information. The consolidation process cleans this up.
-
Similarity Matching — Entries are compared using bigram-based Jaccard similarity. The default threshold is 0.85 (on a 0–1 scale), meaning entries that share 85 % or more of their text structure are considered duplicates.
-
Merging — When similar entries are found, the system keeps the longest content, unions all tags, and records which entries were merged.
-
Pruning — Auto-extracted entries older than 90 days (configurable) that have rarely been referenced can be automatically removed. Entries that have been extracted or referenced multiple times are preserved.
Manual Entries Are Never Touched
Consolidation only operates on auto-extracted entries. Your manually saved entries are always left intact.
Settings Reference¶
All memory settings are found in Settings > Memory.
| Setting | Description | Default |
|---|---|---|
| Enable memory system | Turn the entire memory system on or off. | On |
| Show memory indicator | Display the database icon in the chat header. | On |
| Max memory tokens | Maximum token budget for memory injection (500–8000). Higher values provide more context but consume more of the model's context window. | 2000 |
Namespace-level controls (per bank):
| Control | Description |
|---|---|
| Enabled toggle | Include or exclude this bank from memory injection. |
| Import / Export | Transfer banks as JSON files. |
| View | Open the Memory Viewer drawer. |
| Delete | Permanently remove the bank and all entries. |
Tips & Best Practices¶
-
Organize by topic — Create separate banks for different domains (e.g., "Work", "Coding", "Personal"). This makes it easy to toggle context on and off as needed.
-
Use tags — Tags like
#python,#preferences, or#project-xhelp you filter and find entries quickly in the Memory Viewer. -
Review auto-extracted entries — Periodically open the Memory Viewer to check what the AI has stored automatically. Edit or delete entries that are inaccurate or no longer relevant.
-
Mind the token budget — A higher Max memory tokens value gives the AI more context but leaves less room for the conversation itself. Start with the default (2000) and increase only if needed.
-
Export for backup — Before making large changes, export your important banks as JSON files so you can restore them if needed.