Memory Internals
Technical deep dive into MUXI's three-tier memory system
Technical implementation of MUXI's memory system: data structures, algorithms, caching strategies, and performance characteristics.
Architecture Overview
┌─────────────────────────────────────────────────────────┐
│ Memory Manager │
├─────────────────────────────────────────────────────────┤
│ ┌───────────────┐ ┌─────────────┐ ┌───────────────┐ │
│ │ Buffer │ │ Working │ │ Persistent │ │
│ │ Memory │ │ Memory │ │ Memory │ │
│ │ │ │ │ │ │ │
│ │ In-memory │ │ FAISSx │ │ SQLite/ │ │
│ │ ring buffer │ │ vectors │ │ PostgreSQL │ │
│ └──────┬────────┘ └──────┬──────┘ └────────┬──────┘ │
│ └──────────────────┼──────────────────┘ │
│ ┌───────────┴───────────┐ │
│ │ Query Planner │ │
│ │ (semantic + recency) │ │
│ └───────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Buffer Memory
Implementation
class BufferMemory:
def __init__(self, size=50, multiplier=10):
self.messages = deque(maxlen=size)
self.summaries = deque(maxlen=size * multiplier)
self.embedding_cache = LRUCache(1000)
Behavior
| Operation | Complexity | Notes |
|---|---|---|
| Add message | O(1) | Append to ring buffer |
| Get recent N | O(N) | Slice from deque |
| Overflow | O(k) | Summarize k oldest messages |
Summarization
When buffer exceeds size:
- Take oldest N messages (batch)
- LLM summarizes into single entry
- Summary moved to extended buffer
- Original messages dropped
Messages: [1, 2, 3, ..., 50, 51] ← overflow!
↓
Summarize [1-10] → "User discussed project setup..."
↓
Messages: [11, 12, ..., 50, 51]
Summaries: ["User discussed project setup..."]
Working Memory
Vector Store
MUXI uses FAISS for vector similarity search:
class WorkingMemory:
def __init__(self, dimension=1536):
self.index = faiss.IndexFlatIP(dimension) # Inner product
self.metadata = {} # id → metadata mapping
Indexing
New content (tool output, user info)
↓
Embed with LLM
↓
Add to FAISS index
↓
Store metadata (timestamp, source, user_id)
Search
def search(self, query: str, k: int = 5) -> List[Memory]:
query_vec = embed(query)
distances, indices = self.index.search(query_vec, k)
return [self.metadata[i] for i in indices[0]]
Semantic similarity, not keyword matching.
FIFO Cleanup
When max_memory_mb exceeded:
def cleanup(self):
while self.size_mb > self.max_memory_mb:
oldest = self.get_oldest()
self.remove(oldest.id)
Configurable interval:
memory:
working:
max_memory_mb: 10
fifo_interval_min: 5
Persistent Memory
Schema (PostgreSQL)
CREATE TABLE memories (
id UUID PRIMARY KEY,
user_id VARCHAR(255) NOT NULL,
session_id VARCHAR(255),
content TEXT NOT NULL,
embedding VECTOR(1536),
memory_type VARCHAR(50), -- 'fact', 'preference', 'context'
importance FLOAT DEFAULT 0.5,
created_at TIMESTAMP DEFAULT NOW(),
accessed_at TIMESTAMP DEFAULT NOW(),
access_count INT DEFAULT 0
);
CREATE INDEX idx_user ON memories(user_id);
CREATE INDEX idx_embedding ON memories USING ivfflat (embedding);
CREATE INDEX idx_importance ON memories(importance DESC);
Importance Scoring
Memories are scored for importance:
def calculate_importance(memory: Memory) -> float:
recency = time_decay(memory.created_at)
access_frequency = log(memory.access_count + 1)
explicit_importance = memory.explicit_importance or 0.5
return (0.3 * recency +
0.3 * access_frequency +
0.4 * explicit_importance)
Higher importance = kept longer, surfaced more often.
User Synopsis Caching
Two-Tier Cache
┌─────────────────────────────────────┐
│ Identity Cache │ TTL: 7 days
│ (name, preferences, traits) │
└─────────────────────────────────────┘
↓
┌─────────────────────────────────────┐
│ Context Cache │ TTL: 1 hour
│ (recent topics, current goals) │
└─────────────────────────────────────┘
Synopsis Generation
def generate_synopsis(user_id: str) -> Synopsis:
# Gather all user memories
memories = persistent.get_all(user_id)
# LLM synthesizes into structured synopsis
synopsis = llm.synthesize(
prompt="Create a user profile from these memories...",
memories=memories
)
# Cache with appropriate TTL
cache.set(f"identity:{user_id}", synopsis.identity, ttl=7*24*3600)
cache.set(f"context:{user_id}", synopsis.context, ttl=3600)
return synopsis
Token Savings
| Approach | Tokens per Request |
|---|---|
| Full history | 5,000-50,000 |
| Synopsis only | 500-1,000 |
| Savings | 80-95% |
Query Planning
When an agent needs context, the Query Planner coordinates all tiers:
def get_context(query: str, user_id: str) -> Context:
# 1. Always include recent buffer
recent = buffer.get_recent(10)
# 2. Semantic search across all tiers
working_results = working.search(query, k=5)
persistent_results = persistent.search(query, user_id, k=5)
# 3. Get user synopsis
synopsis = cache.get(f"synopsis:{user_id}") or generate_synopsis(user_id)
# 4. Rank and deduplicate
all_results = rank_and_dedupe(
recent + working_results + persistent_results
)
# 5. Fit within context window
return fit_to_window(synopsis, all_results, max_tokens=4000)
Multi-User Isolation
Implementation
Every memory operation includes user_id:
def add_memory(content: str, user_id: str):
# User ID is part of the primary key namespace
memory = Memory(
id=f"{user_id}:{uuid4()}",
user_id=user_id,
content=content
)
self.store(memory)
def search(query: str, user_id: str):
# Always filter by user_id
return self.index.search(
query,
filter={"user_id": user_id}
)
No cross-user access possible at the data layer.
Performance Characteristics
| Operation | Latency | Notes |
|---|---|---|
| Buffer add | <1ms | In-memory deque |
| Buffer search | 1-5ms | Linear scan, small N |
| Working search | 5-20ms | FAISS similarity |
| Persistent search | 20-100ms | PostgreSQL + pgvector |
| Synopsis generation | 500-2000ms | LLM call (cached) |
| Full context build | 50-200ms | All tiers + ranking |
Configuration Reference
memory:
buffer:
size: 50 # Messages before summarization
multiplier: 10 # Extended capacity factor
vector_search: true # Enable semantic search
embedding_model: openai/text-embedding-3-small
working:
max_memory_mb: 10 # Memory limit
fifo_interval_min: 5 # Cleanup frequency
vector_dimension: 1536 # Embedding dimension
persistent:
enabled: true
provider: postgresql
connection_string: ${{ secrets.POSTGRES_URI }}
user_isolation: true
synopsis:
identity_ttl: 604800 # 7 days
context_ttl: 3600 # 1 hour
refresh_on_access: true
Next Steps
- Memory Concepts - High-level overview
- Memory Configuration - YAML reference
- Multi-Tenancy - User isolation details