Memory Internals
Technical deep dive into MUXI's three-tier memory system
Technical implementation of MUXI's memory system: data structures, algorithms, caching strategies, and performance characteristics.
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Memory Manager │
├─────────────────────────────────────────────────────────┤
│ ┌───────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ Buffer │ │ Working │ │ Persistent │ │
│ │ Memory │ │ Memory │ │ Memory │ │
│ │ │ │ │ │ │ │
│ │ In-memory │ │ FAISSx │ │ SQLite/ │ │
│ │ ring buffer │ │ vectors │ │ PostgreSQL │ │
│ └──────┬────────┘ └──────┬──────┘ └────────┬──────┘ │
│ └──────────────────┼──────────────────┘ │
│ ┌───────────┴───────────┐ │
│ │ Query Planner │ │
│ │ (semantic + recency) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Buffer Memory
Implementation
class BufferMemory:
def __init__(self, size=50, multiplier=10):
self.messages = deque(maxlen=size)
self.summaries = deque(maxlen=size * multiplier)
self.embedding_cache = LRUCache(1000)
Behavior
| Operation | Complexity | Notes |
|---|---|---|
| Add message | O(1) | Append to ring buffer |
| Get recent N | O(N) | Slice from deque |
| Overflow | O(k) | Summarize k oldest messages |
Summarization
When buffer exceeds size:
- Take oldest N messages (batch)
- LLM summarizes into single entry
- Summary moved to extended buffer
- Original messages dropped
Messages: [1, 2, 3, ..., 50, 51] ← overflow!
↓
Summarize [1-10] → "User discussed project setup..."
↓
Messages: [11, 12, ..., 50, 51]
Summaries: ["User discussed project setup..."]
Working Memory
Vector Store
MUXI uses FAISS for vector similarity search:
class WorkingMemory:
def __init__(self, dimension=1536):
self.index = faiss.IndexFlatIP(dimension) # Inner product
self.metadata = {} # id → metadata mapping
Indexing
New content (tool output, user info)
↓
Embed with LLM
↓
Add to FAISS index
↓
Store metadata (timestamp, source, user_id)
Search
def search(self, query: str, k: int = 5) -> List[Memory]:
query_vec = embed(query)
distances, indices = self.index.search(query_vec, k)
return [self.metadata[i] for i in indices[0]]
Semantic similarity, not keyword matching.
FIFO Cleanup
When max_memory_mb exceeded:
def cleanup(self):
while self.size_mb > self.max_memory_mb:
oldest = self.get_oldest()
self.remove(oldest.id)
Configurable interval:
memory:
working:
max_memory_mb: 10
fifo_interval_min: 5
Persistent Memory
Dynamic Embedding Dimensions
Persistent memory uses dimension-specific tables rather than a single fixed table. The table name is derived from the embedding model's output dimension:
| Embedding Model | Dimension | Table Name |
|---|---|---|
openai/text-embedding-3-small
| 1536 | memories_1536
|
openai/text-embedding-3-large
| 3072 | memories_3072
|
local/all-MiniLM-L6-v2
| 384 | memories_384
|
local/all-mpnet-base-v2
| 768 | memories_768
|
This is handled by the get_memory_model(dimension) factory, which dynamically creates SQLAlchemy ORM models:
def get_memory_model(dimension: int):
tablename = f"memories_{dimension}"
# Returns a dynamically-created SQLAlchemy model class
# with Vector(dimension) column matching the embedding size
Multiple formations sharing the same database can each use a different embedding model -- their tables won't conflict.
Schema (PostgreSQL)
-- Table name varies by embedding dimension (e.g., memories_1536, memories_384)
CREATE TABLE memories_1536 (
id UUID PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
session_id VARCHAR(255),
content TEXT NOT NULL,
embedding VECTOR(1536), -- Matches the embedding model's dimension
memory_type VARCHAR(50), -- 'fact', 'preference', 'context'
importance FLOAT DEFAULT 0.5,
created_at TIMESTAMP DEFAULT NOW(),
accessed_at TIMESTAMP DEFAULT NOW(),
access_count INT DEFAULT 0
);
CREATE INDEX idx_user ON memories_1536(user_id);
CREATE INDEX idx_embedding ON memories_1536 USING ivfflat (embedding vector_cosine_ops);
CREATE INDEX idx_importance ON memories_1536(importance DESC);
Local Embedding Models
MUXI supports running embedding models locally via sentence-transformers, with no API key required. Use the local/ prefix:
llm:
models:
- embedding: "local/all-MiniLM-L6-v2" # 384 dimensions
# or: "local/all-mpnet-base-v2" # 768 dimensions
The model is downloaded automatically on first use. This is useful for development, air-gapped environments, or reducing API costs.
Note: SQLite-backed formations automatically fall back to local embeddings when no API-based embedding model is configured.
Migrating Between Embedding Models
If you change your embedding model (e.g., from openai/text-embedding-3-small at 1536 dims to local/all-MiniLM-L6-v2 at 384 dims), existing memories need re-embedding. Use the migration script:
python scripts/migrate_embeddings.py \
--from-dim 1536 \
--to-dim 384 \
--to-model "local/all-MiniLM-L6-v2" \
--connection-string "postgresql://user:pass@localhost/muxi"
The script reads from memories_{from_dim}, re-embeds each memory with the target model, and inserts into memories_{to_dim}. The source table is preserved (not deleted).
Importance Scoring
Memories are scored for importance:
def calculate_importance(memory: Memory) -> float:
recency = time_decay(memory.created_at)
access_frequency = log(memory.access_count + 1)
explicit_importance = memory.explicit_importance or 0.5
return (0.3 * recency +
0.3 * access_frequency +
0.4 * explicit_importance)
Higher importance = kept longer, surfaced more often.
User Synopsis Caching
Two-Tier Cache
┌─────────────────────────────────────┐
│ Identity Cache │ TTL: 7 days
│ (name, preferences, traits) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Context Cache │ TTL: 1 hour
│ (recent topics, current goals) │
└─────────────────────────────────────┘
Synopsis Generation
def generate_synopsis(user_id: str) -> Synopsis:
# Gather all user memories
memories = persistent.get_all(user_id)
# LLM synthesizes into structured synopsis
synopsis = llm.synthesize(
prompt="Create a user profile from these memories...",
memories=memories
)
# Cache with appropriate TTL
cache.set(f"identity:{user_id}", synopsis.identity, ttl=7*24*3600)
cache.set(f"context:{user_id}", synopsis.context, ttl=3600)
return synopsis
Token Savings
| Approach | Tokens per Request |
|---|---|
| Full history | 5,000-50,000 |
| Synopsis only | 500-1,000 |
| Savings | 80-95% |
Query Planning
When an agent needs context, the Query Planner coordinates all tiers:
def get_context(query: str, user_id: str) -> Context:
# 1. Always include recent buffer
recent = buffer.get_recent(10)
# 2. Semantic search across all tiers
working_results = working.search(query, k=5)
persistent_results = persistent.search(query, user_id, k=5)
# 3. Get user synopsis
synopsis = cache.get(f"synopsis:{user_id}") or generate_synopsis(user_id)
# 4. Rank and deduplicate
all_results = rank_and_dedupe(
recent + working_results + persistent_results
)
# 5. Fit within context window
return fit_to_window(synopsis, all_results, max_tokens=4000)
Multi-User Isolation
Implementation
Every memory operation includes user_id:
def add_memory(content: str, user_id: str):
# User ID is part of the primary key namespace
memory = Memory(
id=f"{user_id}:{uuid4()}",
user_id=user_id,
content=content
)
self.store(memory)
def search(query: str, user_id: str):
# Always filter by user_id
return self.index.search(
query,
filter={"user_id": user_id}
)
No cross-user access possible at the data layer.
Performance Characteristics
| Operation | Latency | Notes |
|---|---|---|
| Buffer add | <1ms | In-memory deque |
| Buffer search | 1-5ms | Linear scan, small N |
| Working search | 5-20ms | FAISS similarity |
| Persistent search | 20-100ms | PostgreSQL + pgvector |
| Synopsis generation | 500-2000ms | LLM call (cached) |
| Full context build | 50-200ms | All tiers + ranking |
Configuration Reference
memory:
buffer:
size: 50 # Messages before summarization
multiplier: 10 # Extended capacity factor
vector_search: true # Enable semantic search
embedding_model: openai/text-embedding-3-small # Or local/all-MiniLM-L6-v2
working:
max_memory_mb: 10 # Memory limit
fifo_interval_min: 5 # Cleanup frequency
persistent:
enabled: true
provider: postgresql
connection_string: ${{ secrets.POSTGRES_URI }}
user_isolation: true
synopsis:
identity_ttl: 604800 # 7 days
context_ttl: 3600 # 1 hour
refresh_on_access: true
Next Steps
- Memory Concepts - High-level overview
- Memory Configuration - YAML reference
- Multi-Tenancy - User isolation details