Memory System

How MUXI remembers context across conversations and sessions

Ask ChatGPT Ask Claude.ai Open in Cursor

MUXI's four-layer memory system handles everything from immediate conversation context to long-term user knowledge. Automatic tiering, intelligent caching, and semantic search - all built in.

The Four Layers

flowchart TB
    subgraph "Memory System"
        direction TB
        A["Buffer Memory:
Recent messages (fast, in-memory)"]
        B["Working Memory:
Active session state (FAISSx)"]
        C["User Synopsis:
Who is this user? (LLM-synthesized)"]
        D["Persistent Memory:
Long-term storage (Postgres/SQLite)"]
    end
    A --> B
    B --> C
    C --> D

flowchart TB
    subgraph "Memory System"
        direction TB
        A["<b>Buffer Memory:</b><br>Recent messages (fast, in-memory)"]
        B["<b>Working Memory:</b><br>Active session state (FAISSx)"]
        C["<b>User Synopsis:</b><br>Who is this user? (LLM-synthesized)"]
        D["<b>Persistent Memory:</b><br>Long-term storage (Postgres/SQLite)"]
    end
    A --> B
    B --> C
    C --> D

Layer	Purpose	Storage
Buffer	Recent messages	In-memory
Working	Session state, tool outputs	FAISSx (vector)
User Synopsis	Who the user is	Derived from persistent
Persistent	Long-term facts	Postgres/SQLite

FAISSx: MUXI's Vector Store

MUXI uses FAISSx - our wrapper around Meta's FAISS library - for vector storage in working memory.

Why FAISSx?

Can be deployed as a server for multi-instance setups
When you deploy formations across multiple servers, they can share memory
Fast semantic search for context retrieval

memory:
  working:
    provider: faissx
    server: "tcp://faissx.internal:45678"  # Optional: shared server

Multi-Tenancy Requires Postgres

Important: By default (without persistent storage), MUXI supports only a single user with no long-term memory.

Setup	Users	Long-term Memory	Use Case
No persistent storage	Single user	❌ None	Local testing
SQLite	Single user	✅ Yes	Simple deployments
Postgres	Multi-tenant	✅ Yes, per user	Production

To enable multi-tenancy:

memory:
  persistent:
    provider: postgres
    connection_string: ${{ secrets.POSTGRES_URL }}

With Postgres:

Each user gets a namespace
Memory is segregated by user
User synopsis works properly
Long-term memory persists across sessions

User Synopsis: "Who Am I Talking To?"

The Overlord always knows who it's communicating with via user synopsis - an LLM-synthesized profile:

User Synopsis for alice@acme.com:
- Name: Alice Johnson
- Role: Product Manager at Acme Corp
- Timezone: PST
- Prefers concise, data-driven responses
- Working on Q4 planning
- Recent topics: API performance, monitoring

How it's built:

User interacts over time
Important facts extracted to persistent memory
LLM synthesizes synopsis on demand
Cached for performance (configurable TTL)

Why it matters:

Overlord personalizes responses
Agents have context about who they're helping
No need to repeat background info

memory:
  persistent:
    user_synopsis:
      enabled: true
      cache_ttl: 3600  # Refresh every hour

How Memory Flows

New message arrives
         ↓
Stored in buffer memory
         ↓
Relevant context retrieved from working memory (FAISSx)
         ↓
User synopsis loaded (who is this?)
         ↓
Agent processes request with full context
         ↓
Important information saved to persistent memory
         ↓
Working memory updated with session state

You don't manage this manually - MUXI handles tiering automatically.

Semantic Search

When agents need context, MUXI searches across memory layers:

User:  "What's my API key?"
         ↓
MUXI searches:
  - Buffer: recent conversation
  - Working: tool outputs, session state
  - Persistent: "API key xyz123 shared on Jan 5"
         ↓
Agent: "Your API key is xyz123, from January 5th."

Vector embeddings enable semantic similarity, not just keyword matching.

MUXI supports both API-based and local embedding models. The embedding dimension is detected automatically and each dimension gets its own storage table, so different formations can coexist in the same database.

llm:
  models:
    # API-based
    - embedding: "openai/text-embedding-3-small"    # 1536 dims
    # Or local (no API key needed; full HuggingFace repo id)
    - embedding: "local/nomic-ai/nomic-embed-text-v1.5"   # 768 dims (default)

The default local model is pre-downloaded by muxi-server init and shared across formations via a host cache bind-mounted at /opt/hf-cache.

Slug shape matters, and the runtime now fails fast on typos. OneLLM's local/ provider expects local// (HuggingFace's required two-segment shape) — local/all-MiniLM-L6-v2 is invalid; the canonical form is local/sentence-transformers/all-MiniLM-L6-v2. As of Runtime v0.20260503.0, every formation-declared model is exercised with a minimal real round-trip at load time. A 404 / shape-invalid slug aborts startup with a ConfigurationValidationError and a precise corrected-form suggestion, instead of silently falling back to recency-only retrieval on the first user request.

If you don't need a custom embedding model, omit the embedding: line entirely. The runtime will use its tested default, which is pre-warmed in the official Docker images. Declaring a slug only to match the default just creates an opportunity for typos.

User Isolation

Each user's memory is completely isolated (when using Postgres):

User A: "My password is secret123"
        → stored in user_a namespace

User B: "What's my password?"
        → searches user_b namespace
        → "I don't have that information"

No data leaks between users.

Token Savings with Synopsis

Without synopsis, every request includes full history:

100 messages × 100 tokens = 10,000 tokens per request

With synopsis:

Synopsis: ~300 tokens
Savings: 97% reduction!

For users with long histories, this dramatically reduces costs.

Configuration

Basic Setup (Single User)

Persistent memory is enabled by default with SQLite -- no configuration needed. For basic customization:

memory:
  buffer:
    size: 50
  working:
    provider: faissx

When persistent is omitted, SQLite is automatically enabled with memory.db in the formation directory. Your formation has persistent memory out of the box.

Production Setup (Multi-Tenant)

memory:
  buffer:
    size: 50

  working:
    provider: faissx
    server: "tcp://faissx.internal:45678"

  persistent:
    provider: postgres
    connection_string: ${{ secrets.POSTGRES_URL }}
    user_synopsis:
      enabled: true
      cache_ttl: 3600

Summary

Layer	What It Stores	When It's Used
Buffer	Recent messages	Immediate context
Working	Session state, tool outputs	Current task
User Synopsis	Who the user is	Every request
Persistent	Long-term facts	Returning users

Key points:

FAISSx for vector storage (can be shared across instances)
Postgres required for multi-tenancy
User synopsis reduces token costs dramatically
All memory is user-isolated in production

Learn More

Configure Memory - YAML syntax
Add Memory Guide - Step-by-step tutorial
Memory Internals - Technical deep dive

We use cookies